I wrote up a test (AttachDetachTest) that launches several native threads which attach / detach to the JVM while a Java thread is generating concurrent marking cycles (with System.gc() and +ExplicitGCInvokesConcurrent). This test also hits the assert which proves that, indeed, the thread attaching operation was also prone to the same issue. The fix allows the test to run with no failures.
I've given the test to SQE for integration into the VM testbase (see CR 6989659).
This issue is related to the fix for 6935821: G1: threads created during marking do not active their SATB queues. It turns out the fix was not covering a subtle case.
When threads get attached to the JVM, a safepoint might happen between the JavaThread constructor being called and the thread being added to the Java thread list. If that happens, then we might have a safepoint that activates / deactivates the active values of the SATB buffers without the new thread that is being attached being on the Java thread list. This can result in its active value being inconsistent compared to that of all the other threads. Here's an example.
- new JavaThread is created with SATB active value being false (no marking cycle is taking place)
- a safepoint happens and the SATB active values are set to true for all threads as a marking cycle is starting
- the newly-created JavaThread is then added to the Java thread list with an inconsistent active value compared to all the other threads
- we will catch this when we try to set the SATB active values back to false when the marking cycle is complete
The DestroyJavaVM thread is an example of such a thread. Whether its SATB queue is activated or not it does not matter (it would not do any updates). So, minus the assert failure, the issue is benign for that thread. However, the same problem could arise for native threads being attached to the JVM (through, say, JNI), for which the issue would not be benign. (Thanks to Ramki for this observation.)
The first attempt was to introduce a flag to the PtrQueue class to force specific instances of its subblcasses to be ignored from being activated / de-activated. The DestroyJavaVM thread doesn't really need the buffers so that mechanism would have caused its queues never to be activated and hence the problem will disappear for that thread. This would solve this specific failure, but not the fact that other threads that are attached to the JVM might hit the same issue.
The real fix is to not set the active field of the SATB buffer when the JavaThread object is created. Instead, we set it in the Threads::add() method just before the thread is added to the Java thread list. That method holds the Threads_lock which ensures that no safepoint is in progress, so we can safely read the active field of the SATB queue set. (Thanks to Igor for suggesting it.)
In G1, each Java thread has a local flag telling it whether its SATB buffer is activate or not (it is set to true during concurrent marking and false outside concurrent marking). This failure is telling us that there is at least one thread that has an "active" value that is inconsistent with the global active value. Thankfully, the code that does the consistency check dumps more information in the GC log when such a failure is detected:
SATB queue active values for Java Threads
SATB queue set: active is FALSE
expected_active is FALSE
thread DestroyJavaVM, active is TRUE
thread Thread-1, active is FALSE
thread Thread-0, active is FALSE
thread Low Memory Detector, active is FALSE
thread CompilerThread0, active is FALSE
thread Signal Dispatcher, active is FALSE
thread Surrogate Locker Thread (CMS), active is FALSE
thread Finalizer, active is FALSE
thread Reference Handler, active is FALSE
So, it looks as if one thread does have a different active value than the rest. However, said thread is the DestroyJavaVM thread. It's unclear from the log whether the problem only happens for the DesotroyJavaVM thread (in which case this failure would be a non-issue for a product build) or for any Java thread (in which case the issue might be more serious). For completeness here's the stack trace:
Stack: [0xa717f000,0xa71ff000], sp=0xa71fe1a0, free space=1fca71ff000k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x1662b7f];; __1cHVMErrorGreport6MpnMoutputStream__v_+0x70b
V [libjvm.so+0x1663d06];; __1cHVMErrorOreport_and_die6M_v_+0x582
V [libjvm.so+0x7bb5f9];; __1cPreport_vm_error6Fpkci11_v_+0x5ad
V [libjvm.so+0x13ce6f8];; __1cQSATBMarkQdDueueSetWset_active_all_threads6Mbb_v_+0x204
V [libjvm.so+0x6fc473];; __1cOConcurrentMarkFabort6M_v_+0x193
V [libjvm.so+0x88e08b];; __1cPG1CollectedHeapNdo_collection6MbbI_v_+0x41f
V [libjvm.so+0x88eaf7];; __1cPG1CollectedHeapSdo_full_collection6Mb_v_+0x17
V [libjvm.so+0x16a3bc0];; __1cQVM_G1CollectFullEdoit6M_v_+0x9c
V [libjvm.so+0x16a16c9];; __1cMVM_OperationIevaluate6M_v_+0xe1
V [libjvm.so+0x169f3e1];; __1cIVMThreadSevaluate_operation6MpnMVM_Operation__v_+0x145
V [libjvm.so+0x169fb9d];; __1cIVMThreadEloop6M_v_+0x5c5
V [libjvm.so+0x169efc1];; __1cIVMThreadDrun6M_v_+0xc1
V [libjvm.so+0x1291ea4];; java_start+0x708
C [libc.so.1+0xbc203] _thrp_setup+0x9b;; _thrp_setup+0x9b
C [libc.so.1+0xbc490] _lwp_start+0x0;; _lwp_start+0x0
So, we are about to do a Full GC and we abort the concurrent marking phase which is in progress. In this case, the call to set_active_all_threads() looks like this (from ConcurrentMark::abort()):
false, /* new active value */
satb_mq_set.is_active() /* expected_active */);
So, we are de-activating the active flags just before the Full GC, but most likely the expected active value (read from satb_mq_set.is_active()) is also false (i.e., the buffers are already inactive. But we don't check in this case and go ahead and de-active them anyway).