Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6888316
Votes 0
Synopsis G1: has_aborted() || _cm->region_stack_empty() fails
Category hotspot:garbage_collector
Reported Against
Release Fixed hs17(b04), 7(b75) (Bug ID:2184797)
State 10-Fix Delivered, bug
Priority: 3-Medium
Related Bugs 6847956 , 6893095
Submit Date 05-OCT-2009
Description
During testing we've come across this assertion failure. Poonam hit it while looking at another bug (CR 6847956).

------------------------------------------------------------------------------

#  Internal Error (concurrentMark.cpp:3492), pid=14287, tid=73
#  Error: guarantee(has_aborted() || _cm->region_stack_empty(),"only way to exit the loop")

  [5] VMError::report_and_die(0xffffffff7e7562e8, 0x0, 0x1, 0xffffffff7e5b1e37, 0xffffffff7e760fd1,0xffffffff7e73df20), at 0xffffffff7e42fd64
  [6] report_fatal(0xffffffff7e4bf4b9, 0xda4, 0xffffffff7e4bf528, 0xffffffffffc1f758, 0x3e0884, 0x3e0800), at 0xffffffff7e009384
  [7] CMTask::drain_region_stack(0x104a2f8d0, 0x1, 0x0, 0x0, 0xffffffff7dfd8270, 0x1), at 0xffffffff7dfd86a4
  [8] CMTask::do_marking_step(0x1001f8cd0, 0x104a2f8d0, 0x2000, 0xffffffff7e4bfd08, 0xffffffff7e6ee000, 0xffffffff7e7798f0), at 0xffffffff7dfd8ba4
  [9] CMConcurrentMarkingTask::work(0xffffffff705ff570, 0x5, 0x106679000, 0x1001f8cd0, 0xffffffff7e73605c, 0xffffffff7dfda6e8), at 0xffffffff7dfdaa8c
  [10] GangWorker::loop(0x106679000, 0x6, 0xffffffff7e438980, 0x1022a2ff0, 0x1, 0x5), at 0xffffffff7e438a00
  [11] java_start(0x106679000, 0x67a24, 0x37cf, 0xffffffff7e536cb9, 0xffffffff7e6ee000, 0x106163ae0), at 0xffffffff7e30c928

From disassembly, looks like the guarantee was violated because region stack was not empty.

(dbx) x 0xffffffff7dfd86a4-40/20i
0xffffffff7dfd867c: drain_region_stack+0x03ec:  ldub     [%i0 + 300], %l3    //i0=CMTask* , l3=has_aborted
0xffffffff7dfd8680: drain_region_stack+0x03f0:  ldx      [%i0 + 24], %o0      //ConcurrentMark*
0xffffffff7dfd8684: drain_region_stack+0x03f4:  cmp      %l3, 0                    // l3=0
0xffffffff7dfd8688: drain_region_stack+0x03f8:  bne,pn   %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd868c: drain_region_stack+0x03fc:  nop
0xffffffff7dfd8690: drain_region_stack+0x0400:  ld       [%o0 + 484], %i1 
0xffffffff7dfd8694: drain_region_stack+0x0404:  cmp      %i1, 0             // i1=1
0xffffffff7dfd8698: drain_region_stack+0x0408:  be,pn    %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd869c: drain_region_stack+0x040c:  mov      3492, %o1
0xffffffff7dfd86a0: drain_region_stack+0x0410:  add      %l0, -82, %o2
0xffffffff7dfd86a4: drain_region_stack+0x0414:  call     report_fatal   ! 0xffffffff7e009360

Core and logs in /usr/de119005/gctest/drain_stack_failure on v4v-t5220c-sca11.sfbay.

------------------------------------------------------------------------------

I don't think the bug that caused 6847956 could also be causing this, so I opened a separate CR.
Posted Date : 2009-10-05 16:42:43.0
Work Around
N/A
Evaluation
From John Cuthbertson:

(01:16:50 PM) John Cuthbertson: I think one thread has to scanning (the last) region when it fails and another thread has to be attempting to pop from the region stack before the other region scan fails.
(01:17:16 PM) John Cuthbertson: I think that's the only condition that could cause the guarantee to trip.
Posted Date : 2009-10-05 17:19:26.0

I'm convinced that, when there's more than one marking thread, the guarantee is bogus.

Basically, the guarantee checks that we should not have the case that a marking thread has not aborted and the region stack is not empty. However, the first condition is the local abort flag (i.e., whether the thread itself is aborting the marking step), not the global abort flag (which will cause all the marking threads to abort). Given this, here's a plausible scenario that can cause the guarantee to fire:

(here "region subset" stands for what we push on the region stack, to differentiate from actual heap regions)

thread A is scanning region subset RS
thread B notices that region stack is not empty, tries to pop an entry
thread C notices that region staci is not empty, tries to pop an entry
thread B succeeds in popping the last entry from the region stack and start scanning it
thread A decides to abort the region subset iteration (say, it times out) and pushes the remainder on the region stack
thread C hits the assertion and it will find that it has not yet decided to abort, but also that the region stack is not empty (as A just pushed a region on it).

I can't really think of another guarantee that would be useful and would also make sense. I think we should just remove it.
Posted Date : 2009-10-05 21:53:53.0

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/4c3458a31e17
Posted Date : 2009-10-07 16:26:07.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang