Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6775807
Votes 0
Synopsis G1: Crash in G1 collector
Category hotspot:garbage_collector
Reported Against
Release Fixed
State 3-Accepted, bug
Priority: 3-Medium
Related Bugs 6622411
Submit Date 24-NOV-2008
Description
J2SE Version (please include all output from java -version flag):
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b40)
Java HotSpot(TM) Client VM (build 14.0-b07, mixed mode, sharing)


Does this problem occur on J2SE 1.3, 1.4.x or 1.5?  Yes / No (pick one)
No (bug specific to the new G1 collector)


Operating System Configuration Information (be specific):
Windows Vista SP1 [6.0.6001], Brazilian Portuguese, fully patched


Hardware Configuration Information (be specific):
 customer  Precision T3400 ( customer  Core2 Quad Q6600; NVidia Quadro FX1700)


Bug Description:
Testing the new "Garbage First" collector, I executed the same benchmark first with the CMS collector (default settings) and then with G1. The CMS test executed successfully, and G1 proceeded quite far into the test but crashed with an access violation, apparently in a Young-GC pause.


Steps to Reproduce (be specific):
The benchmark is a unit test of a large customer application, so it would be difficult if possible at all to deliver a reproducible testcase. I picked this app to stress-test G1 because it's a very memory-intensive app. Follows a decsiption of the app, if it helps. I'm also attaching the detailed GC logs for both runs, and also the crash log file produced by HotSpot.

The system calculates the Billing form Brazil's largest inter-bank ATM network, from TecBan. The monthy billing job requires processing ~8 million transaction records, and produces ~2,5 million result records; all this in a single transaction performed by a J2EE 1.4 app running on WebSphere 6.1. (The unit test was performed without any appserver and with Sun's JVMs, however.) This transaction is split in many groups (per each bank+product billed); the largest group reaches ~700,000 records, which are loaded into memory and processed. For these worst-case groups, the system requires ~700Mb of heap to run successfully, without OutOfMemoryErrors or degradation with excessive GC events. The result records are also voluminous but they are inserted in the database in batches of 1,000 and then discarded (made eligible for GC). So there are two main sources of GC activity:
1) Continuous Young-GC to clean up these batches of output records (and also all the garbage produced by the Oracle 9i JDBC driver, which is a quite messy driver that allocates a ton of trash).
2) After each bank+product group's processing is completely, all its input records are discarded, which makes virtually the entire content of the heap eligible for GC. The program doesn't use explicit System.gc() calls, so in practice this leaves a enormous number of tenured objects in the heap, and the next group may hit the heap limit as it loads and accumulates its own input records in the heap; which induces Full-GC events.

In my benchmark, I configured the heap to a static size: min=max=750M, so the test should run combortably with either GC. The primary purpose was investigating the performance of G1, but since it crashed, I'm now using it as a test for G1's stability.

It's worth notice that the benchmarked app is strictly single-threaded, but I executed it on a quad-core CPU and G1 itself is a heavily concurrent and parallel collector, so that may be some race bug. (I used -Xbatch to eliminate JIT concurrency from the diagnostic space.)
Posted Date : 2008-11-24 21:38:06.0
Work Around
N/A
Evaluation
N/A
Comments
  
  Include a link with my name & email   

Submitted On 25-NOV-2008
opinalid
Follows a summary of the HoSpot crash log, in benefit of other reviewers without access to the attached files:

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x6dc920a3, pid=4720, tid=3140
#
# Java VM: Java HotSpot(TM) Server VM (14.0-b07 mixed mode windows-x86 )
# Problematic frame:
# V  [jvm.dll+0x1420a3]


Submitted On 03-MAR-2009
opinalid
This bug is apparently FIXED. I repeated the same application test in JDK 6u14ea-b02 and it's gone, G1 didn't crash anymore. (I didn't retest in later JDK 7 builds, I see that G1 is still in flux with recent improvements like survivor spaces.)

G1 is also performing very well for this particular app: CMS executed it in 46m34.643s, but G1 executed in 39m17.207s. I'm even suspecting of some benchmarking problem because a ~15% cut of total execution time is just too good to be true, and this app is database-bound (but it executed on a fully dedicated system incuding DBMS). I performed some quick analysis ofr hte verbose:gc logs, some findings:

- With both collectors, the JVM allocates and disposes ~62Gb of memory. It's a respectable memory churn for a 750Mb heap and a Q6600 CPU with 2x4M L2.

- G1 performs 496 collections (freeing 125Mb avg), CMS performns 930 collections (freeing 67Mb avg). This is consistent with my general observation that G1 tends to do less collections but free more space per collection.

- G1's max pause was 0,27s / average 0,0386s / total 19,17s; CMS was max 0,43s / average 0,0204s / total 18,99s. So G1 spends only slightly more time with GC but it keeps lower max pauses, as expected.

G1 didn't perform any full-GC, only a few "partial" collections where the pausing phases are not longer than typical young GCs. But in near the end of the test it performed a few young GCs with long pauses:

[GC pause (young) 611M->159M(750M), 0.0326780 secs]
[GC pause (young) 616M->168M(750M), 0.0534140 secs]
[GC pause (young) 616M->189M(750M), 0.1020850 secs]
[GC pause (young) 610M->217M(750M), 0.2755259 secs]
[GC pause (young) 601M->224M(750M), 0.0292465 secs]
[GC pause (young) 605M->251M(750M), 0.1409068 secs]
[GC pause (young) 604M->277M(750M), 0.2152154 secs]
[GC pause (young) 601M->300M(750M), 0.2468394 secs]
[GC pause (young) 599M->324M(750M), 0.2086483 secs]
[GC pause (young) 597M->332M(750M), 0.0560460 secs]

This happens when the app scans through a very big ResultSet, scanning each record and inserting derived records, so there is little memory retention (only the inserted records are batched in groups of 1000), the heap is basically dominated by garbage produced very quickly by the Oracle JDBC driver. G1 should probaby do more frequent collections here, notice that each young-GC is a "big" collection of up to ~400Mb in a single shot (and the live sets are still large).


Submitted On 22-JUN-2009
I still can reproduce this bug in version  
Java(TM) Platform, Standard Edition for Business (build 1.6.0_14-rev-b09)
Java HotSpot(TM) Client VM (build 14.0-b17, mixed mode, sharing)
At the time the application it's creating lot of new objects it is crashing.



PLEASE NOTE: JDK6 is formerly known as Project Mustang