|
Description
|
During testing we've come across this assertion failure. Poonam hit it while looking at another bug (CR 6847956).
------------------------------------------------------------------------------
# Internal Error (concurrentMark.cpp:3492), pid=14287, tid=73
# Error: guarantee(has_aborted() || _cm->region_stack_empty(),"only way to exit the loop")
[5] VMError::report_and_die(0xffffffff7e7562e8, 0x0, 0x1, 0xffffffff7e5b1e37, 0xffffffff7e760fd1,0xffffffff7e73df20), at 0xffffffff7e42fd64
[6] report_fatal(0xffffffff7e4bf4b9, 0xda4, 0xffffffff7e4bf528, 0xffffffffffc1f758, 0x3e0884, 0x3e0800), at 0xffffffff7e009384
[7] CMTask::drain_region_stack(0x104a2f8d0, 0x1, 0x0, 0x0, 0xffffffff7dfd8270, 0x1), at 0xffffffff7dfd86a4
[8] CMTask::do_marking_step(0x1001f8cd0, 0x104a2f8d0, 0x2000, 0xffffffff7e4bfd08, 0xffffffff7e6ee000, 0xffffffff7e7798f0), at 0xffffffff7dfd8ba4
[9] CMConcurrentMarkingTask::work(0xffffffff705ff570, 0x5, 0x106679000, 0x1001f8cd0, 0xffffffff7e73605c, 0xffffffff7dfda6e8), at 0xffffffff7dfdaa8c
[10] GangWorker::loop(0x106679000, 0x6, 0xffffffff7e438980, 0x1022a2ff0, 0x1, 0x5), at 0xffffffff7e438a00
[11] java_start(0x106679000, 0x67a24, 0x37cf, 0xffffffff7e536cb9, 0xffffffff7e6ee000, 0x106163ae0), at 0xffffffff7e30c928
From disassembly, looks like the guarantee was violated because region stack was not empty.
(dbx) x 0xffffffff7dfd86a4-40/20i
0xffffffff7dfd867c: drain_region_stack+0x03ec: ldub [%i0 + 300], %l3 //i0=CMTask* , l3=has_aborted
0xffffffff7dfd8680: drain_region_stack+0x03f0: ldx [%i0 + 24], %o0 //ConcurrentMark*
0xffffffff7dfd8684: drain_region_stack+0x03f4: cmp %l3, 0 // l3=0
0xffffffff7dfd8688: drain_region_stack+0x03f8: bne,pn %icc,drain_region_stack+0x428 ! 0xffffffff7dfd86b8
0xffffffff7dfd868c: drain_region_stack+0x03fc: nop
0xffffffff7dfd8690: drain_region_stack+0x0400: ld [%o0 + 484], %i1
0xffffffff7dfd8694: drain_region_stack+0x0404: cmp %i1, 0 // i1=1
0xffffffff7dfd8698: drain_region_stack+0x0408: be,pn %icc,drain_region_stack+0x428 ! 0xffffffff7dfd86b8
0xffffffff7dfd869c: drain_region_stack+0x040c: mov 3492, %o1
0xffffffff7dfd86a0: drain_region_stack+0x0410: add %l0, -82, %o2
0xffffffff7dfd86a4: drain_region_stack+0x0414: call report_fatal ! 0xffffffff7e009360
Core and logs in /usr/de119005/gctest/drain_stack_failure on v4v-t5220c-sca11.sfbay.
------------------------------------------------------------------------------
I don't think the bug that caused 6847956 could also be causing this, so I opened a separate CR.
Posted Date : 2009-10-05 16:42:43.0
|
|
Evaluation
|
From John Cuthbertson:
(01:16:50 PM) John Cuthbertson: I think one thread has to scanning (the last) region when it fails and another thread has to be attempting to pop from the region stack before the other region scan fails.
(01:17:16 PM) John Cuthbertson: I think that's the only condition that could cause the guarantee to trip.
Posted Date : 2009-10-05 17:19:26.0
I'm convinced that, when there's more than one marking thread, the guarantee is bogus.
Basically, the guarantee checks that we should not have the case that a marking thread has not aborted and the region stack is not empty. However, the first condition is the local abort flag (i.e., whether the thread itself is aborting the marking step), not the global abort flag (which will cause all the marking threads to abort). Given this, here's a plausible scenario that can cause the guarantee to fire:
(here "region subset" stands for what we push on the region stack, to differentiate from actual heap regions)
thread A is scanning region subset RS
thread B notices that region stack is not empty, tries to pop an entry
thread C notices that region staci is not empty, tries to pop an entry
thread B succeeds in popping the last entry from the region stack and start scanning it
thread A decides to abort the region subset iteration (say, it times out) and pushes the remainder on the region stack
thread C hits the assertion and it will find that it has not yet decided to abort, but also that the region stack is not empty (as A just pushed a region on it).
I can't really think of another guarantee that would be useful and would also make sense. I think we should just remove it.
Posted Date : 2009-10-05 21:53:53.0
http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/4c3458a31e17
Posted Date : 2009-10-07 16:26:07.0
|