SUGGESTED FIX
For the assert failure:
Only check the assert if find_contiguous() returns a value that's not -1 (i.e., it found an entry that satisfies the allocation request):
--- a/src/share/vm/gc_implementation/g1/heapRegionSeq.cpp
+++ b/src/share/vm/gc_implementation/g1/heapRegionSeq.cpp
@@ -195,10 +195,10 @@
assert(0 <= res && res < _regions.length(),
err_msg("res: %d should be valid", res));
_alloc_search_start = res + (int) num;
+ assert(0 < _alloc_search_start && _alloc_search_start <= _regions.length(),
+ err_msg("_alloc_search_start: %d should be valid",
+ _alloc_search_start));
}
- assert(0 < _alloc_search_start && _alloc_search_start <= _regions.length(),
- err_msg("_alloc_search_start: %d should be valid",
- _alloc_search_start));
return res;
}
|
EVALUATION
For the guarantee failure:
First, I should point out that this happens _very_ infrequently. Also note that this failure is not directly caused by the new free list changes but it's uncovered by them (before it would never happen as a pause would always wait for the concurrent cleanup operation to complete; now it can carry on through the pause).
The stack trace is this:
current thread: t@12
[1] __read(0x0, 0xfffffd7ff35fe310, 0x10, 0x0, 0x1, 0x7fe0), at 0xfffffd7fff293eda
[2] read(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff279f21
=>[3] os::message_box(title = ???, message = ???) (optimized), at 0xfffffd7fe8442a59 (line ~4161) in "os_solaris.cpp"
[4] VMError::show_message_box(this = ???, buf = ???, buflen = ???) (optimized), at 0xfffffd7fe8a07d1e (line ~50) in "vmError_solaris.cpp"
[5] VMError::report_and_die(this = ???) (optimized), at 0xfffffd7fe8a0748d (line ~786) in "vmError.cpp"
[6] report_vm_error(file = ???, line = ???, error_msg = ???, detail_msg = ???) (optimized), at 0xfffffd7fe73226d7 (line ~216) in "debug.cpp"
[7] G1CollectorPolicy::predict_survivor_regions_evac_time(this = ???) (optimized), at 0xfffffd7fe750fddc (line ~1771) in "g1CollectorPolicy.cpp"
[8] G1CollectorPolicy::calculate_young_list_target_length(this = ???, rs_lengths = ???) (optimized), at 0xfffffd7fe750dd32 (line ~499) in "g1CollectorPolicy.cpp"
[9] G1CollectorPolicy::record_concurrent_mark_cleanup_completed(this = ???) (optimized), at 0xfffffd7fe7512664 (line ~470) in "g1CollectorPolicy.cpp"
[10] ConcurrentMarkThread::run(this = ???) (optimized), at 0xfffffd7fe729f6d1 (line ~254) in "concurrentMarkThread.cpp"
[11] java_start(thread_addr = ???) (optimized), at 0xfffffd7fe843349e (line ~1067) in "os_solaris.cpp"
[12] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff28acf5
[13] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff28afb0
So, at the end of concurrent cleanup we call record_concurrent_mark_cleanup_completed() which updates some of the prediction fields. However, given that the concurrent mark thread does not join the Suspendible Thread Set before it does that, that operation could happen during a pause. This is the log fragment that hints that this is what might be happening:
[GC concurrent-cleanup-start]
[GC pause (young) 257M->257M(2032M), 0.0652511 secs]
[GC pause (partial)# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/g1CollectorPolicy.cpp:1771
==============================================================================
Unexpected Error
------------------------------------------------------------------------------
Internal Error at g1CollectorPolicy.cpp:1771, pid=28818, tid=12
guarantee(hr->is_young() && hr->age_in_surv_rate_group() != -1) failed: invariant
Do you want to debug the problem?
To debug, run 'dbx - 28818'; then switch to thread 12
Enter 'yes' to launch dbx automatically (PATH must include dbx)
Otherwise, press RETURN to abort...
==============================================================================
259M->254M(2032M), 0.1267242 secs]
|