United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 6819098 G1: reduce RSet scanning times
6819098 : G1: reduce RSet scanning times

Details
Type:
Enhancement
Submit Date:
2009-03-18
Status:
Closed
Updated Date:
2011-03-08
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs15
Fixed Versions:
hs16

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:
Relates:

Sub Tasks

Description
We recently tried G1 on a 32GB heap, with a 4GB / 8GB young gen. We saw that, even though not much was copied during GC (so we can assume that the RSets of the collection set were relatively empty), the RSet scanning times were higher than we would have expected them, around 5ms on average for the 4GB and 13ms on average for the 8GB young gen. We should see if there's a bottleneck somewhere to allow us to speed up the RSet scanning code.
Modified previous naive work stealing algorithm by introducing a feedback-driven exponential skipping.

Testing: JBB2005 on a 16-core intel core2 box with 30G heap (25G young gen), 13 GC threads. The RSet scanning times reduced ~600%.

                                    

Comments
EVALUATION

Approved for JDK 7 M3 build 59.
                                     
2009-05-05
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/b803b1b9e206
                                     
2009-04-28
EVALUATION

One extra data point: using larger regions (8MB instead of 1MB) decreases the RSet scanning time dramatically (given that larger regions means fewer regions in the CSet). See also CR 6819085.
                                     
2009-03-25
EVALUATION

Initially I thought that the RSet scanning time might be dominated by the iteration over the collection set, which would be hard to speed up. However, I'm not now so sure. First the scanning time it's 2.5 times longer for a young gen twice the size. Second because of another bug (see 6819077) thread 0 starts late into the GC and doesn't actually scan any RSets (but it does iterate over the CSet trying to find RSets to claim). Its times are 1.4ms for the 4GB and 2.4ms for the 8GB young gen. So, the iteration itself seems reasonably short. So the bottleneck is due to contention (somehow) between the GC threads during the RSet scanning operation.
                                     
2009-03-18



Hardware and Software, Engineered to Work Together