SUGGESTED FIX
Event: putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
(jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /net/prt-web.sfbay/prt-workspaces/20070612155632.ysr.mustang/workspace
(prt-web:/net/prt-web.sfbay/prt-workspaces/20070612155632.ysr.mustang/workspace)
User: ysr
Comment:
---------------------------------------------------------
Job ID: 20070612155632.ysr.mustang
Original workspace: neeraja:/net/jano.sfbay/export/hotspot/users1/ysr/mustang
Submitter: ysr
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070612155632.ysr.mustang/
Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070612155632.ysr.mustang/workspace/webrevs/webrev-2007.06.12/index.html
Fixed 6275407: Assertion failure "wrong array object size"
Webrev: http://analemma.sfbay/net/jano/export/disk05/hotspot/users/ysr/mustang/webrev
As Detlefs had opined a while ago, the assert turned out to be too
strong when running with ParNew (and also G1) -- for two reasons:
(1) PLAB's are converted to filler arrays when they are retired and
concurrent block offset table walks by another thread can observe
the change in array size across this retirement. (It is believed
at this time that G1 is not open to this race.)
(2) Old copies of object arrays in Eden or survivor space are used
to indirect a worker thread that stole an object array oop to find the
set of references not yet scanned, which is encoded in the length field.
(G1 uses the same technique and is therefore be subject to the same
race.) A worker thread racing to copy the same object, but just a tad late,
may observe the object across such mutation of its length field.
Our solution was to sharpen the assert to cover these two cases. The assert
will require an appropriate small modification when merged with G1 later this
month.
Fix Verified: y (see (2) below)
Verification Testing:
(1) runThese -quick -testbase with just ParNew and with CMS+ParNew
(2) Tony was able to reproduce the problem fairly quickly with G1
and the fix was sufficient for that case (Thanks Tony for the testing!)
Other Testing: PRT w/heap verification, refworkload w/fastdebug
Reviewed by: John, Jon, Tony, discussed with Peter (see below)
NOTE: Peter suggested an alternate strategy for dealing with this assert;
that is, however, a tad more extensive and may well be attempted in the
future. Peter's suggestion is based on the observation that in one
case the array is stable after a single change (in that case the
size helper's recomputed value will agree with that from the virtual call).
In the second case, the length encoding could use a negative value
to recognize the "old copy" and ignore the assert, which is potentially
open to multiple modification transients.
Files:
update: src/share/vm/oops/oop.inline.hpp
Examined files: 3990
Contents Summary:
1 update
3989 no action (unchanged)
|
|
|
SUGGESTED FIX
Here's the appropriate relaxation of a too-strong assert:-
------- oop.inline.hpp -------
*** /tmp/sccs.vSai_i Fri Jun 8 17:28:40 2007
--- oop.inline.hpp Fri Jun 8 17:28:30 2007
***************
*** 186,197 ****
s = (int)((size_t)round_to(size_in_bytes, MinObjAlignmentInBytes) /
HeapWordSize);
! // Temporarily weaken this assert until CR 6275407 can
! // be dealt with more cleanly; this is too strong for when
! // ParNew runs with promotion labs.
assert((s == klass->oop_size(this)) ||
! (UseParNewGC && !UseConcMarkSweepGC &&
! Universe::heap()->is_gc_active()), "wrong array object size");
} else {
// Must be zero, so bite the bullet and take the virtual call.
s = klass->oop_size(this);
--- 186,214 ----
s = (int)((size_t)round_to(size_in_bytes, MinObjAlignmentInBytes) /
HeapWordSize);
! // UseParNewGC can change the length field of an "old copy" of an object
! // array in the young gen so it indicates the stealable portion of
! // an already copied array. This will cause the first disjunct below
! // to fail if the sizes are computed across such a concurrent change.
! // UseParNewGC also runs with promotion labs (which look like int
! // filler arrays) which are subject to changing their declared size
! // when finally retiring a PLAB; this also can cause the first disjunct
! // to fail for another worker thread that concurrently walking the block
! // offset table. Both these are benign, so we ignore the assertion
! // failure: we might be able to relax it for these two cases in the
! // form of:
! // is_objArray() && is_forwarded()
! // || is_typeArray()
! // With the first line above covering the first scenario above and
! // the second one the second scenario.
! // If and when UseParallelGC uses the same obj array oop stealing/chunking
! // technique, or G1 is integrated (which does use this array chunking
! // technique) we will need to suitably further relax the assertion.
assert((s == klass->oop_size(this)) ||
! ((UseParNewGC && Universe::heap()->is_gc_active()) &&
! (is_typeArray() ||
! (is_objArray() && is_forwarded()))),
! "wrong array object size");
} else {
// Must be zero, so bite the bullet and take the virtual call.
s = klass->oop_size(this);
|
|
|
EVALUATION
Since a temporary workaround has been putback we are downgrading
this and targeting a cleaner fix to Dolphin.
|
|
|
SUGGESTED FIX
A temporary band-aid has been putback; see below:
Event: putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
(jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /net/prt-web.sfbay/prt-workspaces/20050818145809.ysr.dragon/workspace
(prt-web:/net/prt-web.sfbay/prt-workspaces/20050818145809.ysr.dragon/workspace)
User: ysr
Comment:
---------------------------------------------------------
Original workspace: neeraja:/net/spot/scratch/ysr/dragon
Submitter: ysr
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20050818145809.ysr.dragon/
Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20050818145809.ysr.dragon/workspace/webrevs/webrev-2005.08.18/index.html
Partial 6275407: Assertion failure "wrong array object size"
http://analemma.sfbay/net/spot/scratch/ysr/dragon/webrev/
We are temporarily weakening this assert so that it doesnot
fire when ParNew+Tenured is used, since it is believed that
this is too strong when promotion lab surgery by one GC worker
switches the size of a filler array that's being examined
during BOT navigation by another GC worker. A better fix and
a more thorough investigation is deferred, but this weakening
of the assertion is to meanwhile stem the loss of PRT
productivity on its account.
Reviewed by: D. Detlefs, K. Russell
Fix Verified: n
(see below; it's obvious of course that the assert will not
fire now with ParNew+Tenured during GC)
Verification testing:
Ran GCOld and GCBasher in a loop with
ParNew/fastdebug/compiler1/sparc/solaris with
old and new fastdebug binaries, neither of which
reproduced the original assert. Note that the changes
do not affect the product build in any manner whatsoever.
Other Testing:
spec, PRT
Files:
update: src/share/vm/oops/oop.inline.hpp
Examined files: 3670
Contents Summary:
1 update
3669 no action (unchanged)
|
|
|
WORK AROUND
This is a benign assertion when using ParNew or G1;
just use -XX:SuppressErrorAt:... to work around it.
See entry#2 of suggested fix field for an explanation.
|
|
|
|