United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 6662086 6u4+, 7b11+: CMS never clears referents when -XX:+ParallelRefProcEnabled
6662086 : 6u4+, 7b11+: CMS never clears referents when -XX:+ParallelRefProcEnabled

Details
Type:
Bug
Submit Date:
2008-02-12
Status:
Resolved
Updated Date:
2010-12-03
Project Name:
JDK
Resolved Date:
2008-05-21
Component:
hotspot
OS:
solaris_10
Sub-Component:
gc
CPU:
x86,sparc
Priority:
P3
Resolution:
Fixed
Affected Versions:
6u4
Fixed Versions:
hs13

Related Reports
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
It has been observed with 6.0 u4 that CMS is far less efficient than 
with 6.0 u2. While we get a regular sawtooth curve in u2 we notice
with u4 that CMS for some reason seems to be unable to collect all
dead objects. The effect will be be over time more frequent CMS runs 
which collect fewer and fewer objects. Used memory increases as well
as cpu consumption. However, later on, CMS is then suddenly able to
collect large chunks. Memory usage will go down. This scenario will
then repeat from the beginning.
Changed synopsis to reflect evaluation of root cause:

6u2+, 7b11+: CMS never clears referents when -XX:+ParallelRefProcEnabled

                                    

Comments
EVALUATION

The regression was introduced in 6417901 (which was integrated in
6u4 and 7b11). We have verified that the fix in the Suggested Fix
section fixes the problem at both customers who reported the issue.
                                     
2008-05-06
SUGGESTED FIX

JPRT: [hotspotwest] job notification - success with job 2008-05-06-224147.ysr.hg_gc_fixes



JPRT Job ID:            2008-05-06-224147.ysr.hg_gc_fixes
JPRT System Used:       hotspotwest
JPRT Version Used:      1.0: (2008-04-29) Case of the Misguided Missile
  [c2c0735e7f00]
Job URL:
  http://prt-web.sfbay.sun.com/archive/2008/05/2008-05-06-224147.ysr.hg_gc_fixes
Job ARCHIVE:
  /net/prt-archiver.sfbay/data/jprt/archive/2008/05/2008-05-06-224147.ysr.hg_gc_fixes
User:                   ysr
Email:                  ###@###.###
Release:                jdk7
Job Source:             Mercurial:
  /net/spot/workspaces/ysr/hg_gc_fixes/{make,src,agent}
Parent:                 http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot
Push Parent:
  ssh://###@###.###/jdk7/hotspot-gc-gate/hotspot
CR List:                6662086
Changeset:
  http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/b5489bb705c9
File List:              {.}
Exclude List:           {build}
Command Line:           jprt submit -m jprt.txt -noforest
Job submitted at:       Tuesday May 6, 2008 15:41:49 PDT
Total time in queue:    2h 02m 35s
Job started at:         Tuesday May 6, 2008 15:43:48 PDT
Job integrated at:      Tuesday May 6, 2008 17:44:00 PDT
Job finished at:        Tuesday May 6, 2008 17:44:24 PDT
Job run time:           2h 35s
Job state:              success
Job flags:              SYNC INTEGRATE PRECIOUS
Bundles:                USE: jprt install 2008-05-06-224147.ysr.hg_gc_fixes

NOTE: Zip files containing exe or dll files on windows have had problems with
execute permissions. You may need to 'chmod a+x' the windows exe and
dll files.

User Comments:

6662086: 6u4+, 7b11+: CMS never clears referents when
  -XX:+ParallelRefProcEnabled
Summary: Construct the relevant CMSIsAliveClosure used by CMS during parallel
  reference processing with the correct span. It had incorrectly been
  constructed with an empty span, a regression introduced in 6417901.
Reviewed-by: jcoomes
                                     
2008-05-01
EVALUATION

Verified that only CMS was affected by the issue.
Reworked some of the code and added asserts so as to reduce the possibility of
inadvertent such regressions in the future.
                                     
2008-05-01
EVALUATION

Based on data collected by ###@###.### using an instrumented
jvm that he built and visual inspection of the code, the problem
appears to be that the CMSIsAliveClosure passed into the
work method does not have a correctly initialized
_span (from code inspection). We'll also need to check whether
the same (or similar) problem(s) also exist(s) in the other collectors.

In the case of CMS, this was causing the _is_alive closure to
declare that all referents were strongly reachable even when they were
not (when +ParallelRefProcEnabled).

The fix is to pass in the _span to the CMSIsAliveClosure at
construction time so it's correctly initialized. See suggested
fix section.
                                     
2008-04-30
WORK AROUND

Do not use -XX:+ParallelRefProcEnabled
(i.e. revert to default which disables parallel reference processing).
However, this might adversely affect CMS remark pause times in
applications that make heavy use of Reference objects (including
for instance Finalizers) and run on large MP boxes.
                                     
2008-02-14
EVALUATION

Added "when -XX:+ParallelRefProcEnabled" to synopsis, based on further
tests at customer. The regression appears to have started with 6417901
where the parallel reference processing code was extensively reworked
to extend it to other collectors besides CMS. We are investigating the
root cause of the problem now.
                                     
2008-02-14



Hardware and Software, Engineered to Work Together