Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6810474
Votes 0
Synopsis par compact - crash in summary_phase with very full heap
Category hotspot:garbage_collector
Reported Against b10
Release Fixed hs15(b05), 7(b54) (Bug ID:2175663)
State 10-Fix Delivered, bug
Priority: 3-Medium
Related Bugs 6765745
Submit Date 26-FEB-2009
Description
The following tests caused jvm crash with solaris-amd64 binaries during b47 promotion testing in "-XX:+UseParallelOldGC" configuration:

   gc/gctests/FinalizeTest01
   gc/gctests/FinalizeTest02

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xfffffd7ffedb736f, pid=13170, tid=5
#
# JRE version: 7.0-b47
# Java VM: Java HotSpot(TM) 64-Bit Server VM (15.0-b01 mixed mode solaris-amd64 )
# Problematic frame:
# V  [libjvm.so+0xbb736f]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xbb736f];;  void PSParallelCompact::summary_phase(ParCompactionManager*,bool)+0x41f
V  [libjvm.so+0xbb914d];;  void PSParallelCompact::invoke_no_policy(bool)+0x70d
V  [libjvm.so+0xbc7cc0];;  void PSScavenge::invoke()+0x130
V  [libjvm.so+0xb93aba];;  HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x9a
V  [libjvm.so+0x422d86];;  void VM_ParallelGCFailedAllocation::doit()+0x96
V  [libjvm.so+0x422817];;  void VM_Operation::evaluate()+0x77
V  [libjvm.so+0x5b4ef1];;  void VMThread::loop()+0x4c1
V  [libjvm.so+0x5b3e7a];;  void VMThread::run()+0x7a
V  [libjvm.so+0xb74883];;  java_start+0x4c3
C  [libc.so.1+0xd504b]  _thr_slot_offset+0x31b;;  _thr_setup+0x5b
C  [libc.so.1+0xd5280]  _thr_slot_offset+0x550;;  _lwp_start+0x0

VM_Operation (0xfffffd7ffdfce4e0): ParallelGCFailedAllocation, mode: safepoint, requested by thread 0x000000000041c000

Running the test with fastdebug binaries triggers assertion failure:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/BUILD_AREA/jdk7/hotspot/src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.hpp:581), pid=25862, tid=5
#  Error: assert(addr<= _region_end,"bad addr")
#
# JRE version: 7.0-b47
# Java VM: Java HotSpot(TM) 64-Bit Server VM (15.0-b01-fastdebug mixed mode solaris-amd64 )
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000000660800):  VMThread [stack: 0xfffffd7fd18ff000,0xfffffd7fd19ff000] [id=5]

Stack: [0xfffffd7fd18ff000,0xfffffd7fd19ff000],  sp=0xfffffd7fd19fd860,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1d647df];;  void VMError::report(outputStream*)+0x68f
V  [libjvm.so+0x1d6569d];;  void VMError::report_and_die()+0x4fd
V  [libjvm.so+0xa5879b];;  void report_assertion_failure(const char*,int,const char*)+0x5ab
V  [libjvm.so+0x1937d45];;  bool ParallelCompactData::summarize(SplitInfo&,HeapWord*,HeapWord*,HeapWord**,HeapWord*,HeapWord*,HeapWord**)+0x1
2b5
V  [libjvm.so+0x194a9c7];;  void PSParallelCompact::summary_phase(ParCompactionManager*,bool)+0x597
V  [libjvm.so+0x194bbbb];;  void PSParallelCompact::invoke_no_policy(bool)+0x68b
V  [libjvm.so+0x1977e2e];;  void PSScavenge::invoke()+0x21e
V  [libjvm.so+0x187cdd1];;  HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x161
V  [libjvm.so+0x1d67539];;  void VM_ParallelGCFailedAllocation::doit()+0xd9
V  [libjvm.so+0x1d92f1b];;  void VM_Operation::evaluate()+0xfb
V  [libjvm.so+0x1d91033];;  void VMThread::evaluate_operation(VM_Operation*)+0x113
V  [libjvm.so+0x1d9190d];;  void VMThread::loop()+0x72d
V  [libjvm.so+0x1d90c7f];;  void VMThread::run()+0x9f
V  [libjvm.so+0x182762a];;  java_start+0x66a
C  [libc.so.1+0xd504b]  _thr_slot_offset+0x31b;;  _thr_setup+0x5b
C  [libc.so.1+0xd5280]  _thr_slot_offset+0x550;;  _lwp_start+0x0

VM_Operation (0xfffffd7ffc44e340): ParallelGCFailedAllocation, mode: safepoint, requested by thread 0x0000000000440000

To reproduce, do the following:

ssh vmsqe-v20z-01.russia
cd /net/vmsqe.russia/export/execution/results/JDK7/PROMOTION/VM/b47/ParallelOldGC/vm/solaris-amd64/server/mixed/solaris-amd64_server_mixed_vm.gc.testlist/ResultDir/FinalizeTest01
sh rerun.sh # may nedd to run this a couple of times

You can also do:
 bash /net/vmsqe.russia/export/bin/reproduce_bug.sh rerun.sh

The failure is reproducible with jdk7 b46 (hs14 b10), but not with jdk7 b42 (hs14 b09). Perhaps this failure mode became exposed by a number of parallel gc fixes that went into hs14 b10 (6786188, 6784849).
Posted Date : 2009-02-26 13:57:19.0
Work Around
N/A
Evaluation
With a core file and assembly listing, identified the failure point in summarize_split_space(), which was added as part of 6765745.  Failures are in the loop which clears the source_region field for regions that contain part an object which does not fit completely into the destination space:

    const RegionData* const sr = region(split_region); 
    const size_t beg_idx = 
      addr_to_region_idx(region_align_up(sr->destination() + 
                                         sr->partial_obj_size())); 
    const size_t end_idx = 
1---> addr_to_region_idx(region_align_up(destination + partial_obj_size)); 
 
    if (TraceParallelOldGCSummaryPhase) { 
        gclog_or_tty->print_cr("split:  clearing source_region field in [" 
                               SIZE_FORMAT ", " SIZE_FORMAT ")", 
                               beg_idx, end_idx); 
    } 
    for (size_t idx = beg_idx; idx < end_idx; ++idx) { 
2---> _region_data[idx].set_source_region(0); 
    } 

1 is the fastdebug failure (assert in region_align_up()), 2 is the product failure (SEGV).  The address 'destination + partial_object_size' used to compute the upper bound on the loop (end_idx) is outside the heap.
Posted Date : 2009-03-04 01:02:09.0

The code snippet in the above entry is inside this if block:

  if (destination + partial_obj_size > target_end) { 
    ...
    const size_t end_idx = 
      addr_to_region_idx(region_align_up(destination + partial_obj_size)); 
    ...
  }

Here target_end is the end of the target (i.e., destination) space and destination + partial_obj_size extend beyond it.  It's never safe to clear a region beyond the end of the target space; the failure occurs when we try to clear beyond the end of the very last space in the heap (one of the survivor spaces).

Need to use target_end to compute the last region to clear (end_idx) instead of destination + partial_obj_size.
Posted Date : 2009-03-04 01:04:59.0

Generally very difficult to reproduce.  However, running the test

   gc.gctests.Steal.steal002.steal002

with a fastdebug build (32- or 64-bit) on a single-cpu sparc (a rare thing) or using a single cpu processor set causes the test to fail reliably.  No failures are seen with a build containing the suggested fix.
Posted Date : 2009-04-03 07:16:48.0

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/f18338cf04b0
Posted Date : 2009-04-03 10:10:37.0

http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/f18338cf04b0
Posted Date : 2009-04-04 03:09:19.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang