(with more input from Thomas) Further info on the race. It looks as if it happens between the following two threads:
a) a concurrent refinement thread that's sampling the young RSet lengths and it's updating the inc CSet info with update_incremental_cset_info() (which will decrease / increase _inc_cset_recorded_rs_lengths)
b) a mutator thread that's retiring a mutator alloc region and it's adding it to the inc CSet with add_region_to_incremental_cset_lhs() -> add_region_to_incremental_cset_common() which will increase _inc_cset_recorded_rs_lengths.
The updates to _inc_cset_recorded_rs_lengths are not done atomically or in a mutually exclusive way. Thread b) is holding the Heap_lock at that point but thread a) does not take the Heap_lock while doing this operation.
It should also be noted. That several other fields that are updated by add_region_to_incremental_cset_common() and update_incremental_cset_info() could also be corrupted because of this race. We discovered (OK, Thomas did!) the corruption on _inc_cset_recorded_rs_lengths because of the side-effects of the underflow.
Attempting to fix the race by ensuring that thread a) takes the Heap_lock before it calls update_incremental_cset_info() will likely result in a deadlock. Thread a) joins the STS while it's sampling the young RSet lengths (so it has to explicitly yield or leave the STS before a GC can happen). Consider the following scenario:
Thread a) joins the STS, does some work, and tries to take the Heap_lock.
Mutator thread c) is trying to do a GC, takes the Heap_lock (it's done by the VM op) and then waits for all threads in the STS to yield / leave.
If thread a) took the Heap_lock before it joined the STS, it'd probably work. But, it'd keep the Heap_lock for long periods of time which will induce latencies on any mutator thread that needs the Heap_lock in order to retire the active region / allocate a new region.