United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 6483690 CMS: assert(cur_val < top,"All recorded addresses should be less")
6483690 : CMS: assert(cur_val < top,"All recorded addresses should be less")

Details
Type:
Bug
Submit Date:
2006-10-18
Status:
Resolved
Updated Date:
2010-05-09
Project Name:
JDK
Resolved Date:
2007-06-20
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs10

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
Assertion failure during nightly testing on linux-amd64 with test
gc/memory/Churn/Churn2

[2006-10-18T04:43:50.36] # To suppress the following error report, specify this argument
[2006-10-18T04:43:50.36] # after -XX: or in .hotspotrc:  SuppressErrorAt=/concurrentMarkSweepGeneration.cpp:5124
[2006-10-18T04:43:50.36] #
[2006-10-18T04:43:50.36] # An unexpected error has been detected by Java Runtime Environment:
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] #  Internal Error (/PrtBuildDir/workspace/src/share/vm/memory/concurrentMarkSweepGeneration.cpp, 5124), pid=1312, tid=5126
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # Java VM: Java HotSpot(TM) 64-Bit Server VM (20061016062331.jmasa.gc_baseline_merge-debug mixed mode)
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # Error: assert(cur_val < top,"All recorded addresses should be less")
[2006-10-18T04:44:53.94] # An error report file with more information is saved as hs_err_pid1312.log
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # If you would like to submit a bug report, please visit:
[2006-10-18T04:44:53.94] #   http://java.sun.com/webapps/bugreport/crash.jsp
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] VM option '-PrintVMOptions'
[2006-10-18T04:44:53.94] VM option '+UseConcMarkSweepGC'
[2006-10-18T04:44:53.94] VM option '+CMSPermGenSweepingEnabled'
[2006-10-18T04:44:53.94] VM option '+CMSClassUnloadingEnabled'
[2006-10-18T04:44:53.94] VM option '+ExplicitGCInvokesConcurrent'
Adding -XX:-UseCMSCompactAtFullCollection -XX:+PromotionFailureALot
to the mix is a sure way to expose this problem very reliably.

                                    

Comments
SUGGESTED FIX

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
                  (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /net/prt-web.sfbay/prt-workspaces/20070525142420.ysr.mustang/workspace
                  (prt-web:/net/prt-web.sfbay/prt-workspaces/20070525142420.ysr.mustang/workspace)
User:             ysr

Comment:

---------------------------------------------------------

Job ID:                 20070525142420.ysr.mustang
Original workspace:     karachi:/net/jano.sfbay/export/hotspot/users1/ysr/mustang
Submitter:              ysr
Archived data:          /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070525142420.ysr.mustang/
Webrev:                 http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070525142420.ysr.mustang/workspace/webrevs/webrev-2007.05.26/index.html


Fixed 6483690: CMS: assert(cur_val < top,"All recorded addresses should be less")

  Webrev: http://analemma.sfbay/net/jano/export/disk05/hotspot/users/ysr/mustang/webrev.6483690


The problem was that when there is a promotion failure there may be objects
in both the survivor spaces. The parallelization of survivor space remark
was not dealing properly with this situation. To recapitulate, at the end
of a normal scavenge, the survivor space known as "from"-space holds
sruvivors and the one known as "to"-space will, typically not. The from
survivor space which will normally hold survivors is chunked at PLAB
boundaries, and the chunking information saved to a well-known
"survivor space chunking table" which the CMS collector maintains
and which is updated during a scavenge as PLAB's are acquired.
The CMS collector assumes for a remark pause that the values in
this table represent block boundaries for the "from" space.
This assumption is violated when, following a promotion failure,
the survivor space names are not swapped, as done in the normal case.
The bug is extremely rare because the assumed invariant is broken for the
period between the promotion failure and the immediately imminent
mark-compact collection that follows and restores said invariant --
CMS is affected only when a CMS parallel remark phase runs during
that window prior to the baton being passed to the foreground collector
that does the compaction.

In the product builds where the assert is not hit, we'd end up
end up trying to use the information in the chunking table to
scan the "wrong" survivor space and usually crash.

There were several possible fixes for this bug, including (but not limited to)
maintaining the identity of the survivor space in the chunking table,
but the smallest appeared to be to have the scavenge always flip the
names of the spaces, so as to leave the CMS-assumed invariant intact.
Note that while the chunking array does not come into play when using
DefNew, the serial scavenger, we made an identical change
in DefNewGeneration::collect(), for the sake of uniformity.
We should really just suitably factor out the shared code here
rather than duplicating it as currently done.

Fix Verified: y
Verification Testing: runThese -quick -testbase with CMS

Other Testing: PRT, refworkload, runThese -quick -testbase

Reviewed by: Andrey Petrusnko

Files:
update: src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
update: src/share/vm/gc_implementation/parNew/parNewGeneration.cpp
update: src/share/vm/memory/defNewGeneration.cpp

Examined files: 3964

Contents Summary:
       3   update
    3961   no action (unchanged)
                                     
2007-05-30
EVALUATION

Failure to clear survivor chunking array in case of promotion failure
leaves it with obsolete data which the remark phase tries to use.
                                     
2007-05-21
SUGGESTED FIX

The following appears to be the smallest fix to deal with
this issue:-

------- parNewGeneration.cpp -------
*** /tmp/sccs.FLayvw    Mon May 21 16:17:00 2007
--- parNewGeneration.cpp        Mon May 21 16:13:31 2007
***************
*** 785,790 ****
--- 785,791 ----
        gclog_or_tty->print(" (promotion failed)");
      }
      // All the spaces are in play for mark-sweep.
+     swap_spaces();    // Make things simpler for CMS; see 6483690.
      from()->set_next_compaction_space(to());
      gch->set_incremental_collection_will_fail();
    }

Basically what it does is to restore the CMS-assumed invariant that
the data in the survivor plab chunking array always corresponds to
the semi-space named "from". [That invariant would be broken for the
brief window between a scavenge that resulted in a promotion failure
and the subsequent mark-compact which would have restored that invariant.
CMS can under some rare circumstances run during that window before
the collection "baton" is passed to the foreground mark-compact
collection following the failed scavenge.]
                                     
2007-05-21



Hardware and Software, Engineered to Work Together