Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6433335
Votes 1
Synopsis ParNewGC times spiking, eventually taking up 20 out of every 30 seconds
Category hotspot:garbage_collector
Reported Against
Release Fixed hs10(b03), 5.0u10(b02) (Bug ID:2141969) , 6u1(b01) (Bug ID:2141970) , 1.4.2_14(b01) (Bug ID:2141971) , 7(b03) (Bug ID:2176734)
State 10-Fix Delivered, bug
Priority: 3-Medium
Related Bugs 6413516 , 6446077 , 6459113 , 6237967
Submit Date 02-JUN-2006
Description
Customer is using an ER release of 5.0U6
The ER is 1.5.0_06-erdist-2006- customer -01. The bug addressed was 6367204

The hardware is:
        customer  4xDual core Xeon
       32GB RAM
       10x RAID 10 HDD
   The OS is Server 2003.
   The java version and configuration:
       JRE v.1.5.0_06-erdist-20060201
       28GB heap
       GC options: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:ParallelGCThreads=7 -XX:NewSize=128M -XX:MaxNewSize=128M
-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
-XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseLargePages
-XX:+DisableExplicitGC

   The application is a custom distributed database server based on
TCP/IP and Sleepycat DBJE

   The symptoms:
       After running smoothly for ~1-4 days straight with
constant but light load, the ParNew GC's jump from ~150 ms every 30
seconds to 5-20 seconds out of every 30 seconds.  The start of the
degenerate ParNew GCs seem to mostly (but not always) coincide with the
start of a new CMS mark phase.  The general pattern is to spend 20-90%
of the time in young GC, which eventually quiesces down to acceptable
levels after ~4 hours of GC pain (frequently to re-start after the next
CMS sweep).
       The load was constant and unvaried from our side, so we
don't see any application-level cause for the degenerate GC performance.


They ran a test with 5.0u8 and the problem seemed to be pushed out.
The time to failure went to 48 hours for the initial 5 second spike and
another day or so to hit the ~20 second spikes.

They were running with large pages so they ran a test without it and with their ER 5.0u6 and the problem seemed to have gone away, but returned many days later.
Turning off large pages seem to have also extended the running but eventually
they still see the problem
Posted Date : 2006-06-02 14:36:37.0
Work Around
-XX:-UseParNewGC appears to provide a temporary workaround for
this customer for avoiding the long parnew pauses. The downside
is that single-threaded scavenges are slightly longer than
the "good case" when using ParNew. Your milage may vary, but you'll
certainly avoid the long pauses you might otherwise see either
because of very large objects or very large free blocks in the
old generation.
Evaluation
Instrumentation so far appears to implicate the presence of
very large free blocks (but the same could happen with very large
objects as well) in the CMS generation affecting adversely the
card scanning times for scavenges presumably on account of the
extremely frequent and slow block_start() calls.
Posted Date : 2006-06-26 15:43:04.0

See suggested fix section. This bug fix should be back ported along with
that for 6459113.
Posted Date : 2006-09-13 00:14:26.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang