SUGGESTED FIX
JPRT: [sfbay] job notification - success with job 2008-02-21-190720.ysr.hg-gc
JPRT Job ID: 2008-02-21-190720.ysr.hg-gc
JPRT System Used: sfbay
JPRT Version Used: Feb 15 2008 - Case of the Bartered Bikini
[50c84a85177a]
Job URL:
http://javaweb.sfbay/jdk/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc
Job ARCHIVE:
/net/prt-archiver.sfbay/data/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc
User: ysr
Email: ###@###.###
Release: jdk7
Job Source: Mercurial: /net/neeraja/export/ysr/hg-gc/{.}
Parent: /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
Push Parent: /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
File List: {.}
Command Line: jprt submit -m jprt.txt -cr 6642634 -p
/net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
Job submitted at: Thursday February 21, 2008 11:07:22 PST
Total time in queue: 1h 58m 59s
Job started at: Thursday February 21, 2008 11:08:58 PST
Job integrated at: Thursday February 21, 2008 13:06:03 PST
Job finished at: Thursday February 21, 2008 13:06:21 PST
Job run time: 1h 57m 23s
Job state: success
Job flags: SYNC INTEGRATE PRECIOUS
Bundles: USE: jprt install 2008-02-21-190720.ysr.hg-gc
HINT: Use 'jprt rerun -comment <arg> -retest 2008-02-21-190720.ysr.hg-gc' to
rerun the tests for this job (you can also add tests with 'jprt
rerun').
NOTE: Zip files containing exe or dll files on windows have had problems with
execute permissions. You may need to 'chmod a+x' the windows exe and
dll files.
User Comments:
6642634: Test nsk/regression/b6186200 crashed with SIGSEGV
Summary: Use correct allocation path in expand_and_allocate() so object's
mark and p-bits are set as appropriate.
Reviewed-by: jmasa, pbk
Fixed 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV
This is a rather old bug and it's not clear why it started showing up
recently in testing, except that some timing change may have rendered
the bug more easily reproducible. With the right stress options
(see below) the crash can be reproduced with older JVM's as well.
When direct allocation occurs in the old generation, collected by the
CMS collector, concurrent with a CMS cycle, objects must be
allocated live (and P-bits used to mark the size of those objects
to allow precleaning or sweeping phases to determine the sizes
of objects allocated but not yet initialized). This requires the use
of specialized allocation paths which were normally used.
Except when the allocation failed and the generation had to be
expanded to accommodate the allocation. In this case, the correct
allocation path was not used, and consequently the object was not
allocated live. Depending on when the allocation occurred, this
could cause a crash either in a sweeping phase (because the
size of an uninitialized block could not be determined) or in a later
marking phase (because a reachable block had been reclaimed
prematurely).
A temporary workaround, as documented in the bug report, is
to fix the size of the old generation.
Fix Verified: yes
Verification Test: nsk/regression/b6186200
with the set of stress options documented in the
bug report for greater reproducibility.
Without the fix the test fails withing 2-5 iterations of the test
(about 2-5 minutes on the test machine). With the fix the
test was run successfully for more than 48 hours.
It is expected that this fix will also address another couple of
very hard to reproduce heisenbugs that we have seen occasionally
in nightly testing.
Epilogue: the allocation paths can be further cleaned up,
since they seem to have organically evolved over a period of
time and collected a bunch of cruft. That will be done in
a separate CR, meanwhile putting back this more local fix.
|
SUGGESTED FIX
changeset: 6:df2fc160f817
tag: tip
user: ysr
date: Thu Feb 21 11:03:54 2008 -0800
summary: 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV
diff --git a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
--- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
+++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
@@ -3121,12 +3121,7 @@ ConcurrentMarkSweepGeneration::expand_an
if (GCExpandToAllocateDelayMillis > 0) {
os::sleep(Thread::current(), GCExpandToAllocateDelayMillis, false);
}
- size_t adj_word_sz = CompactibleFreeListSpace::adjustObjectSize(word_size);
- if (parallel) {
- return cmsSpace()->par_allocate(adj_word_sz);
- } else {
- return cmsSpace()->allocate(adj_word_sz);
- }
+ return have_lock_and_allocate(word_size, tlab);
}
// YSR: All of this generation expansion/shrinking stuff is an exact copy of
|
EVALUATION
This appears to be a bug in the product at least since 2002 as far
as i can tell, where we are not careful to deal with direct allocation
following an expansion when a CMS cycle is in progress. It is not yet
clear why the bug is difficult to reproduce once we use ParNew.
See the workaround section for a workaround. The "Suggested Fix"
section will be updated with a fix over the next few days.
A bit more archeology is in progress, but it appears as though
all current versions of the JDK going all the way back to 1.4.2
would be vulnerable to this problem.
Watch this space for further updates soon.
|