EVALUATION
http://hg.openjdk.java.net/hsx/hsx17/baseline/rev/e90d89dab490
|
|
|
EVALUATION
http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/bb843ebc7c55
|
|
|
SUGGESTED FIX
See Evaluation section for changeset.
|
|
|
EVALUATION
JNI Critical Section entries were being "fenced" for the entire duration of the
concurrent collection, causing some application threads to be blocked until
the concurrent collection finished. This included the last thread exiting
the JNI critical section before a scavenge was induced.
While fixing this bug will allow somewhat smoother operation of JNI
critical section entries, there will still be cases when excessive
or long uses of JNI critical sections may still cause entering threads
to block for periods of time that are longer than the duration of a single
scavenge. The worst case such blockage will be proportional to the longest
JNI critical sections, and to the number of threads that may concurrently
be in such critical sections.
Thus, fixing this bug will still not match the performance improvement
obtained for NIO get/put from fixing 6570619.
See also Comments field, entry #10
|
|
|
SUGGESTED FIX
The change below to vmCMSOperation.cpp fixes the bug by
allowing the _gc_locker precipitated concurrent full gc not
to block the caller until it's completed. The change to
genCollectedHeap.cpp further below is still being debated since it decouples
the gc locker policy for CMS from ExplicitGCInvokesConcurrent
(since no need is seen for linking the two together in any way).
Whether the gc locker should cause a concurrent full collection
should be more nuanced than the current somewhat ad-hoc bang-bang policy.
diff -r 6ecb6e6de3d6 src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp
--- a/src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp Mon Apr 19 05:40:21 2010 -0700
+++ b/src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp Thu Apr 22 22:35:14 2010 -0700
@@ -190,7 +190,7 @@ void VM_GenCollectFullConcurrent::doit()
CMSCollector::disable_icms();
// In case CMS thread was in icms_wait(), wake it up.
CMSCollector::start_icms();
- // Nudge the CMS thread to start a concurrent collection
+ // Nudge the CMS thread to start a concurrent collection.
CMSCollector::request_full_gc(_full_gc_count_before);
} else {
FullGCCount_lock->notify_all(); // Inform the Java thread its work is done
@@ -231,7 +231,8 @@ void VM_GenCollectFullConcurrent::doit_e
// e.g. at the rate of 1 full gc per ms, this could
// overflow in about 1000 years.
GenCollectedHeap* gch = GenCollectedHeap::heap();
- if (gch->total_full_collections_completed() <= _full_gc_count_before) {
+ if (_gc_cause != GCCause::_gc_locker &&
+ gch->total_full_collections_completed() <= _full_gc_count_before) {
// Now, wait for witnessing concurrent gc cycle to complete,
// but do so in native mode, because we want to lock the
// FullGCEvent_lock, which may be needed by the VM thread
===============================
still under debate:
-------------------
diff -r 6ecb6e6de3d6 src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.hpp
--- a/src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.hpp Mon Apr 19 05:40:21 2010 -0700
+++ b/src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.hpp Thu Apr 22 22:58:50 2010 -0700
@@ -126,8 +126,7 @@ class VM_GenCollectFullConcurrent: publi
GCCause::Cause gc_cause)
: VM_GC_Operation(gc_count_before, full_gc_count_before, true /* full */) {
_gc_cause = gc_cause;
- assert(FullGCCount_lock != NULL && UseConcMarkSweepGC &&
- ExplicitGCInvokesConcurrent, "Otherwise shouldn't be here");
+ assert(FullGCCount_lock != NULL, "Error");
assert(UseAsyncConcMarkSweepGC, "Else will hang caller");
}
~VM_GenCollectFullConcurrent() {}
diff -r 6ecb6e6de3d6 src/share/vm/memory/genCollectedHeap.cpp
--- a/src/share/vm/memory/genCollectedHeap.cpp Mon Apr 19 05:40:21 2010 -0700
+++ b/src/share/vm/memory/genCollectedHeap.cpp Thu Apr 22 22:35:14 2010 -0700
@@ -410,9 +410,9 @@ bool GenCollectedHeap::must_clear_all_so
}
bool GenCollectedHeap::should_do_concurrent_full_gc(GCCause::Cause cause) {
- return (cause == GCCause::_java_lang_system_gc ||
- cause == GCCause::_gc_locker) &&
- UseConcMarkSweepGC && ExplicitGCInvokesConcurrent;
+ return UseConcMarkSweepGC &&
+ (cause == GCCause::_gc_locker ||
+ (cause == GCCause::_java_lang_system_gc && ExplicitGCInvokesConcurrent));
}
void GenCollectedHeap::do_collection(bool full,
|
|
|
SUGGESTED FIX
<<deleted>>
|
|
|
SUGGESTED FIX
<<deleted>>
|
|
|
SUGGESTED FIX
<<deleted>>
|
|
|
EVALUATION
So far I have been able to reproduce the problem with 6u17 and 6u18, but not
with the latest jdk 7. Still investigating as to this difference and indeed the root
cause.
|
|
|
EVALUATION
One difference between 7 and 6uXX is the amount of copying done
per JNI critical section. In jdk 7, large copies are broken
into smaller critical sections. This can prevent starvation of
some get()'s or put()'s because of the gc locker fencing off
new entries into JNI CS while Eden is full and allocation
is proceeding slow path. The root cause of the blockage of
the customer application must lie in this.
I'll find the relevant JDK CR and link this up to it.
I'll also see if the JDK 6'ism of large monolithic copies cannot
be somehow mitigated in the JVM, although i doubt it.
What is still not clear is why the blockages seem to coincide
with CMS cycles. It must be that the CMS cycles are somehow
slowing down or preventing the exit of the threads doing
memcpy inside the JNI critical section. Investigation ongoing.
|
|
|
SUGGESTED FIX
<<deleted>>
|
|
|
WORK AROUND
-XX:-ExplicitGCInvokesConcurrent
|
|
|
EVALUATION
User observed the crucial fact that -XX:+ExplicitGCInvokesConcurrent
is essential to reproduce the blockage. With that crucail piece of information,
it was easy to see the root cause. The bug is in the gc locker malinteracting
with +ExplicitGCInvokesconcurrent, so that while a concurrent background
collection is in progress, the "ongoing gc" flag stays set, so
JNI critical sections stay blocked for that duration. This malinteraction
will have to be fixed so that even while the bkgrd collection is in
progress, JNI critical sections are allowed (which really was the
original intent, except for the incorrect implementation).
|
|
|
EVALUATION
###@###.### informs me that in JDK 7, NIO bulk copies
do not use JNI critical sections, which explains why the blocking
behaviour is not seen in JDK 7.
Cross out my observation about large copies in JDK 7 being broken
into smaller JNI critical sections (above, display 2). While that is
certainly a difference, those routines are not used in JDK 7 to
implement NIO bulk copying; rather the NIO bulk copies use
Unsafe copyMemory which entirely avoids the gc locker issues
above (which issues should nonetheless be fixed for any remianing
current and future clients of the JNI critical section interfaces;
for instance jdk's older than 7).
|
|
|