SUGGESTED FIX
Job ID: 20060131135728.ysr.policy
Original workspace: neeraja:/net/spot/scratch/ysr/policy
Submitter: ysr
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060131135728.ysr.policy/
Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060131135728.ysr.policy/workspace/webrevs/webrev-2006.01.31/index.html
Fixed 6367204: CMS: Poor garbage collection performance
Webrev: http://analemma.sfbay/net/spot/scratch/ysr/policy/webrev
The customer's application turned out to have some very
large arrays (some as large as 36 MB or larger) which
were subject to some mutation during the long marking phase
(due to the large heap) and the precleaning phase (due to the
mutation).
In 5.0u1 I had fixed 4756801, which was supposed to replace
the imprecise scanning of entire object arrays by the
precise scanning of the dirty portions thereof. When fixing
that bug, I had inadvertently introduced this error during
the course of some code clean-up. The performance issue
was not found then because we merely traded one performance
pathology for another.
The problem can affect any application that has large object
arrays that are subject to some mutation.
Instead of scanning precisely the dirty cards of large arrays,
we ended up (repeatedly) scanning the entire prefix of
that array ending in the dirty card! Clearly, we were
doing O(n.k) work for scanning k dirty card (ranges) in
a large object of size n, instead of the intended O(k);
in the worst case this could be O(n2).
The fix, which has been tested on-site at
General Dynamics, showed a 245 x improvement in the
remark pause time. (More tuning is needed, but this
fixes the spectacularly bad results seen earlier;
further residual performance issues will be handled
under a separate CR.)
Thanks to Syed Rayadurg who was on-site with the customer
for running experiments and helping with the diagnosis;
to Vaibhav Kulkarni for helping with obtaining and
analyzing performance data from customer; to Poonam
Bajaj for the prompt backport to Tiger that allowed
prompt on-site fix verification.
Reviewed by: John Coomes, Peter Kessler, Jon Masamitsu
Approved by: Mustang Core Team (Penni Henry)
Fix Verified: yes
Verification Testing:
. customer ran with 28 GB heap and got a 245 x reduction in CMS remark
pause times
Other testing: specjvm, PRT, refworkload, runThese quick (w/CMS)
. Ashwin has been running big apps on the 5.0u7 avatar
of this bug fix.
|
|
|
WORK AROUND
"Don't use very large object arrays that mutate or hold young references," said he,
after having carefully donned his flame suit.
|
|
|
SUGGESTED FIX
here are the context-diffs in a mustang workspace:
*** src/share/vm/memory/space.cpp- Mon Dec 19 10:20:53 2005
--- src/share/vm/memory/space.cpp Thu Jan 26 23:30:06 2006
*** 1,7 ****
#ifdef USE_PRAGMA_IDENT_SRC
! #pragma ident "@(#)space.cpp 1.199 05/12/14 14:12:18 JVM"
#endif
/*
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
--- 1,7 ----
#ifdef USE_PRAGMA_IDENT_SRC
! #pragma ident "@(#)space.cpp 1.200 06/01/26 23:30:06 JVM"
#endif
/*
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
*** 485,503 ****
// This assert will not work when we go from cms space to perm
// space, and use same closure. Easy fix deferred for later. XXX YSR
// assert(prev == NULL || contains(prev), "Should be within space");
bool last_was_obj_array = false;
! HeapWord* addr;
size_t size;
if (prev > mr.start()) {
addr = prev;
} else {
! addr = block_start(mr.start());
}
HeapWord* end_addr = mr.end();
! MemRegion mr2(addr, end_addr);
for (; addr < end_addr; addr += size) {
size = block_size(addr);
if (block_is_obj(addr)) {
last_was_obj_array = cl->do_object_bm(oop(addr), mr2);
} else {
--- 485,505 ----
// This assert will not work when we go from cms space to perm
// space, and use same closure. Easy fix deferred for later. XXX YSR
// assert(prev == NULL || contains(prev), "Should be within space");
bool last_was_obj_array = false;
! HeapWord *addr, *addr2;
size_t size;
if (prev > mr.start()) {
+ addr2 = prev;
addr = prev;
} else {
! addr2 = mr.start();
! addr = block_start(addr2);
}
HeapWord* end_addr = mr.end();
! MemRegion mr2(addr2, end_addr);
for (; addr < end_addr; addr += size) {
size = block_size(addr);
if (block_is_obj(addr)) {
last_was_obj_array = cl->do_object_bm(oop(addr), mr2);
} else {
*** 522,540 ****
// See comment above (in more general method above) in case you
// happen to use this method.
assert(prev == NULL || is_in_reserved(prev), "Should be within space");
bool last_was_obj_array = false;
! HeapWord* addr;
size_t size;
if (prev > mr.start()) {
addr = prev;
} else {
! addr = block_start(mr.start());
}
HeapWord* end_addr = mr.end();
! MemRegion mr2(addr, end_addr);
for (; addr < end_addr; addr += size) {
oop obj = oop(addr);
size = obj->size();
last_was_obj_array = cl->do_object_bm(obj, mr2);
}
--- 524,544 ----
// See comment above (in more general method above) in case you
// happen to use this method.
assert(prev == NULL || is_in_reserved(prev), "Should be within space");
bool last_was_obj_array = false;
! HeapWord *addr, *addr2;
size_t size;
if (prev > mr.start()) {
+ addr2 = prev;
addr = prev;
} else {
! addr2 = mr.start();
! addr = block_start(addr2);
}
HeapWord* end_addr = mr.end();
! MemRegion mr2(addr2, end_addr);
for (; addr < end_addr; addr += size) {
oop obj = oop(addr);
size = obj->size();
last_was_obj_array = cl->do_object_bm(obj, mr2);
}
------------------
here's the webrev (in a mustang workspace):
http://analemma.sfbay.sun.com/net/spot/scratch/ysr/policy/webrev/
|
|
|
EVALUATION
There was a coding error in space::object_iterate_m() which would
cause quadratic complexity when scanning large somewhat sparsely
dirty arrays (spanning many cards). In the case of this customer
there were some arrays that were many millions of cards large
(some were 36 MB large or larger). Correcting the coding error
allowed the dirty cards to be precisely scanned, and reduced
the rescan pauses to a more acceptable range (by more than
two orders of magnitude).
Any application with large object arrays (which mutate or hold
pointers to young gen) would be susceptible to
large remark pauses because of this bug.
The performance bug was inadvertently introduced when making the
fix in 4756801. Unfortunately, the performance tests performed
for that fix verification did not reveal the pathology because
the performance before the fix was bad and we seemed to have
simply traded one performance pathology (scanning entire
arrays) for another (repeatedly scanning prefixes of sparsely
dirty arrays).
Even with this performance fix there will be another performance
issue related to repeated BOT queries for sparsely dirty cards.
(This affects both CMS and scavenges.) That bug will be addressed
in a separate CR 6391214.
|
|
|
|