United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 7118202 G1: eden size unnecessarily drops to the minimum
7118202 : G1: eden size unnecessarily drops to the minimum

Details
Type:
Bug
Submit Date:
2011-12-05
Status:
Closed
Updated Date:
2012-03-22
Project Name:
JDK
Resolved Date:
2012-01-20
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs23
Fixed Versions:
hs23

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
We see every now and then G1 decreasing the eden size to the minimum for a while without apparent reason and keeping it there for a few GCs before things get back to normal.

It looks as if the issue is an integer overflow (underflow?) during this calculation:

    size_t rs_length_diff = _max_rs_lengths - _recorded_rs_lengths;

It looks as if _max_rs_lengths is smaller than _recorded_rs_lengths, rs_length_diff (being an unsigned value) gets _very_ large, and the prediction way overpredict.

Many thanks for Thomas Schatzl who, once again, tracked this down.

                                    

Comments
EVALUATION

http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d23d2b18183e
                                     
2012-03-22
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d23d2b18183e
                                     
2011-12-15
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/d23d2b18183e
                                     
2011-12-08
SUGGESTED FIX

We'll go with the above defensive fix but, now that we know what the race is, we're going to also fix it on a separate CR (7119027).
                                     
2011-12-07
EVALUATION

(with more input from Thomas) Further info on the race. It looks as if it happens between the following two threads:

a) a concurrent refinement thread that's sampling the young RSet lengths and it's updating the inc CSet info with update_incremental_cset_info() (which will decrease / increase _inc_cset_recorded_rs_lengths)

b) a mutator thread that's retiring a mutator alloc region and it's adding it to the inc CSet with add_region_to_incremental_cset_lhs() -> add_region_to_incremental_cset_common() which will increase _inc_cset_recorded_rs_lengths.

The updates to _inc_cset_recorded_rs_lengths are not done atomically or in a mutually exclusive way. Thread b) is holding the Heap_lock at that point but thread a) does not take the Heap_lock while doing this operation.

It should also be noted. That several other fields that are updated by add_region_to_incremental_cset_common() and update_incremental_cset_info() could also be corrupted because of this race. We discovered (OK, Thomas did!) the corruption on _inc_cset_recorded_rs_lengths because of the side-effects of the underflow.

Additional note:

Attempting to fix the race by ensuring that thread a) takes the Heap_lock before it calls update_incremental_cset_info() will likely result in a deadlock. Thread a) joins the STS while it's sampling the young RSet lengths (so it has to explicitly yield or leave the STS before a GC can happen). Consider the following scenario:

Thread a) joins the STS, does some work, and tries to take the Heap_lock.
Mutator thread c) is trying to do a GC, takes the Heap_lock (it's done by the VM op) and then waits for all threads in the STS to yield / leave.
Deadlock.

If thread a) took the Heap_lock before it joined the STS, it'd probably work. But, it'd keep the Heap_lock for long periods of time which will induce latencies on any mutator thread that needs the Heap_lock in order to retire the active region / allocate a new region.
                                     
2011-12-06
SUGGESTED FIX

Given that we're planning to revamp the prediction code, the most prudent course of action is to be defensive and catch the case where rs_length_diff underflows (and set it to 0 when this happens).
                                     
2011-12-05



Hardware and Software, Engineered to Work Together