Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
Child workspace: /net/prt-web.sfbay/prt-workspaces/20070525142420.ysr.mustang/workspace
Job ID: 20070525142420.ysr.mustang
Original workspace: karachi:/net/jano.sfbay/export/hotspot/users1/ysr/mustang
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070525142420.ysr.mustang/
Fixed 6483690: CMS: assert(cur_val < top,"All recorded addresses should be less")
The problem was that when there is a promotion failure there may be objects
in both the survivor spaces. The parallelization of survivor space remark
was not dealing properly with this situation. To recapitulate, at the end
of a normal scavenge, the survivor space known as "from"-space holds
sruvivors and the one known as "to"-space will, typically not. The from
survivor space which will normally hold survivors is chunked at PLAB
boundaries, and the chunking information saved to a well-known
"survivor space chunking table" which the CMS collector maintains
and which is updated during a scavenge as PLAB's are acquired.
The CMS collector assumes for a remark pause that the values in
this table represent block boundaries for the "from" space.
This assumption is violated when, following a promotion failure,
the survivor space names are not swapped, as done in the normal case.
The bug is extremely rare because the assumed invariant is broken for the
period between the promotion failure and the immediately imminent
mark-compact collection that follows and restores said invariant --
CMS is affected only when a CMS parallel remark phase runs during
that window prior to the baton being passed to the foreground collector
that does the compaction.
In the product builds where the assert is not hit, we'd end up
end up trying to use the information in the chunking table to
scan the "wrong" survivor space and usually crash.
There were several possible fixes for this bug, including (but not limited to)
maintaining the identity of the survivor space in the chunking table,
but the smallest appeared to be to have the scavenge always flip the
names of the spaces, so as to leave the CMS-assumed invariant intact.
Note that while the chunking array does not come into play when using
DefNew, the serial scavenger, we made an identical change
in DefNewGeneration::collect(), for the sake of uniformity.
We should really just suitably factor out the shared code here
rather than duplicating it as currently done.
Fix Verified: y
Verification Testing: runThese -quick -testbase with CMS
Other Testing: PRT, refworkload, runThese -quick -testbase
Reviewed by: Andrey Petrusnko
Examined files: 3964
3961 no action (unchanged)
The following appears to be the smallest fix to deal with
------- parNewGeneration.cpp -------
*** /tmp/sccs.FLayvw Mon May 21 16:17:00 2007
--- parNewGeneration.cpp Mon May 21 16:13:31 2007
*** 785,790 ****
--- 785,791 ----
gclog_or_tty->print(" (promotion failed)");
// All the spaces are in play for mark-sweep.
+ swap_spaces(); // Make things simpler for CMS; see 6483690.
Basically what it does is to restore the CMS-assumed invariant that
the data in the survivor plab chunking array always corresponds to
the semi-space named "from". [That invariant would be broken for the
brief window between a scavenge that resulted in a promotion failure
and the subsequent mark-compact which would have restored that invariant.
CMS can under some rare circumstances run during that window before
the collection "baton" is passed to the foreground mark-compact
collection following the failed scavenge.]