Triaging 7041440 brought to light the following:
The test program for 7041440 consisted of a number of Java threads who just perform a single System.gc() call. When run normally, these System.gc() calls result in back to back full GCs (requested by different threads). When one thread is successful in starting a full GC, those thread who have not yet done a full GC will be blocked and waiting to start their own.
In the test case for 7041440, the test program was run with +ExplicitGCInvokesConcurrent.
ExplicitGCInvokesConcurrent is supposed to convert the full GC to an evacuation pause that starts a concurrent marking cycle. The requesting thread then blocks until the concurrent marking is complete.
With the test case for 7041440 we see the following (perhaps silly) behavior:
Each thread that requests a System.gg() creates an instance of the VM_G1IncCollectionPause vm operation and enqueues it on the VM operation queue (using VMThread::execute).
The VM thread then starts executing these enqueued VM operations...
The VM thread executes the VM_G1IncCollectionPause for thread A. It executes an initial mark pause. Thread A waits in G1IncCollection::doit_epilogue until the concurrent mark completes. During this evacuation pause, concurrent marking is started. This evacuation pause leaves only one survivor region in the collection set.
The VM thread then processes the VM operation enqueued by Thread B and executes G1IncCollection::doit. The VM thread first reads the # of _full_collections_completed. sees that a concurrent mark is already in progress and so does not force an initial mark. It then executes an evacuation pause where the collection set is a single region (the survivor region from the pause requested by thread A). This evacuation pause completes and leaves a single survivor region in the collection set. Thread B waits in VM_G1IncCollectionPause::doit_epilogue until the # of _full_collections_completed is incremented at the end of the marking cycle.
The VM thread then processs the VM operation enqueued by Thread C and executes G1IncCollection::doit....
And so on.
We see a bunch of evacuation pauses where the collection set is only one heap region as a result of the enqueued VM_G1IncCollectionPause instances. At some point the surviving data is promoted and the collection set for the evacuation pauses is empty.
Eventually the marking cycle completes and a new initial mark pause is performed - starting the process over again.
This behavior is obviously wrong. While the marking is in progress - we probably should not be doing the pauses. We should either:
* Wait before the read of _full_collections_completed, before the pause until the marking completes.
When marking completes we would execute another initial mark pause (and concurrent mark).
* Alternatively, if marking is already active we should be skip the pause completely.
In both cases the requesting Java thread will be waiting in VM_G1IncCollectionPause::doit_epilogue() until _full_collections_completed is incremented at the end of the marking cycle.
Skipping the pauses if marking is active (instead of waiting before the pause) is easier.