United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 7103665 HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x97
7103665 : HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x97

Details
Type:
Bug
Submit Date:
2011-10-21
Status:
Closed
Updated Date:
2012-11-07
Project Name:
JDK
Resolved Date:
2012-04-10
Component:
hotspot
OS:
solaris_10
Sub-Component:
gc
CPU:
sparc
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs23

Related Reports
Backport:
Backport:
Backport:
Backport:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

vm_info: Java HotSpot(TM) 64-Bit Server VM (21.0-b17) for solaris-amd64 JRE (1.7.0-b147), built on Jun 27 2011 02:29:28 by "" with Sun Studio 12u1


FULL OS VERSION :
OS:                      Oracle Solaris 11 Express snv_146 X86
     Copyright (c) 2010, Oracle and/or its affiliates.  All rights reserved.
                            Assembled 12 August 2010

uname:SunOS 5.11 snv_146 i86pc  (T2 libthread)
rlimit: STACK 10240k, CORE infinity, NOFILE 65536, AS infinity
load average:1.55 0.76 0.68

CPU:total 24 (6 cores per cpu, 1 threads per core) family 16 model 8 stepping 0, cmov, cx8, fxsr, mmx, sse, sse2, sse3, popcnt, mmxext, 3dnowpref, lzcnt, sse4a

Memory: 4k page, physical 268434660k(205784976k free)


EXTRA RELEVANT SYSTEM CONFIGURATION :
Oracle Sun X4440 with Quad-HexCore (24Cores Total) 128GB DRAM.
1x 1Gb Ethernet Port "igb0" utilized for Network Traffic.
Solaris 11 Build 146 (Just slightly newer than Solaris 11 Express)
Java 7 Build 147

A DESCRIPTION OF THE PROBLEM :
Attempts to balance Eden vs. Old vs. Perm vs. StackSize vs. Runtime.exec() vs. DirectMemory
allocation sizes and avoid (a)Full GC DenialOfService, (b)Perm tagging Temporary Data,
(c)15-250 Active Thread Counts, (d)Avoid OutOfMemory due to 32Bit JVM Barriers, and
(e)Asynchronous I/O Workloads is resulting in GC oops under
"HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x97".

Newly changed are GC Memory optimizations, please let me know if these are wrong and/or
if I could assist by reproducing with other settings.   :)

-d64 (Required to avoid "Runtime.exec()"  OOME failures and greater than -Xmx3550m support)
-Xms22000m (=Xmx ~10% DRAM - How large does Java7 GC scale stably?)
-Xmn21984m (=(Xmx - 16M) attempting to avoid Full GC DenialOfService : Newly added option)
-Xmx22000m (=Xms to avoid OutOfMemoryError with less than Mmx Heap allocated...different issue)
-Xss256k (to work with d64 option)
 -XX:MaxPermSize=16m (enough for JVM, Selectors, ScheduledThreadPool : Newly added option)
-XX:MaxDirectMemorySize=1536m (Required for AsynchronousFileChannel I/O Workloads : Newly added option)
-XX:+UseNUMA (10-30% Performance Boost due to Quad-HexCore physical layout : Newly added option)



THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Did not try

REGRESSION.  Last worked in version 7

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Reproduction (reproduced 8 times in last month):
1. Solaris 11 Build  146, Java7 Build 147, and ABT 9.2 Build 2011.10.20.17.17
    # pkgadd -d /net/net0.us.oracle.com/export/auto/proto/jbuckley_q3fy11_3/packages/i386/SUNWstc-abt.pkg all
    # ln -s /net/net0.us.oracle.com/export/auto/action /opt/SUNWstc-abt/lib/action
    # svcadm restart abt ( may be needed to wakeup smf :)
2. Start IO Workload (below specific to test rig reproducing all 12 cases):
    # /home/jbuckley/startIO
     ... essentially starts the following:
    a. Two Selector Threads managing ServerSocketChannel(s), SocketChannel(s), and
        DatagramChannel(s)  (JavaTIDs 8 & 9, Native ThreadID=29&30) - Minimal Memory Use.
    b. ServerSocketChannel(s) and Bond DatagramChannels for TCP/IP & UDP/IP  for
        ABT, ECHO, DISCARD, DAYTIME, SMTP, TIME, SNMP, SYSLOG Services.  These are completely
      managed via two Selector threads, above.  All incoming SMTP, SNMP, Syslog messages are
      appended to ACTIVE Actions, below.
    c. Two ScheduledThreadPoolExecutor Threads managing LowMemory check (calling System.gc()
       at 95% heap usage) and new Action Interpretation into Action Trees (below).
       (JavaTIDs 11&12, Native ThreadIDs=31&32)  - Minimal Memory Use.
    d. Several Thousand Actions (50-150 at any time) are cycled through ExecutorCompletionService
        providing the following cradle-to-grave life cycle states seen in Thread Names - Large variations
        in memory use as Actions are generated, archived, and recycled - Mostly (90%) Short Lived Data
       on the order of 5 minute life cycle -  Some (9%) Medium Lived Data on the order of 1-5 hours life
       cycle - Rare (1%) Longer Lived Data on the order of 1-5 days life cycle - NO (0%) Perm Data:
        INIT : Allocate Action Memory
        SUBMIT : Allocate Thread from CompletionService
        ACTIVE : Conduct Testing, potentially locally via  "RunTime.exec(...)" or remotely via Socket Transfers
        REMOTE : Optionally Transfer Testing to Remote Host
        CANCEL : Optionally  Abort Testing
        RETURN : Optionally Transfer Testing from Remote Host
        REUSE : ResubmitAction/EmailLogs/etc.
        ARCHIVE : Persist Memory based logs to disk
       REDUCE : Reduce Memory usage to miminal for parent Action ARCHIVE to assist with GC
       REPORT : Wait for Parent Archive Completion
       RECYCLE : Final Action Shreading to assist with GC
       IDLE : Reallocation of Thread to Pool and "if (ACTIONS.count() < 10) Runtime.getRuntime().gc();"
    d. /usr/jdk/jdk1.7.0/bin/jconsole & used to monitor JVM. - Minimal Memory Use.
    e. Actions interact with 2 Linux Clients running ABT, 1 Solaris 11 Client running ABT and a
       ZFSSA/7000 Cluster to do several configuration/load/stress tests.

I have moved the old hs_err_pid####.log files to here:
http://net0.us.oracle.com/shares/export/auto/reports/auto4440-05/

EXPECTED VERSUS ACTUAL BEHAVIOR :
Behavior is seen as the following:
1. Initial/Expected behavior is stable.
    a. Without Actions, JVM has minimal heap and perm memory easily fits in <<16MB.
    b. With Actions, JVM has varying between 68MB to 16GB without issues as workloads varies
        from more serial execution to more concurrent execution.

2. Near Death behavior becomes GC Denial-Of-Service.
    a. GC (Native Thread 21) takes increasing amount of cycles - 10, 40, 80, 100% active in prstat -maL
    b. Eventually, GC (Native TID 21) takes 100% activity and all other threads go 0% flatline.

3. Death behavior has some common themes.
    a. HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x97
    b. "lgrp # space ####K, 100% used" or "PSOldGen total 16384K, used 16384K"


ERROR MESSAGES/STACK TRACES THAT OCCUR :
# /opt/SUNWstc-abt/bin/hs_err_pid22967.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xfffffd7ffd4044d4, pid=22967, tid=21
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode solaris-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x3b44d4]  void PSPromotionFailedClosure::do_object(oopDesc*)+0x8
#
# Core dump written. Default location: /opt/SUNWstc-abt/bin/core or core.22967
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000001856000):  VMThread [stack: 0xfffffd7ff6636000,0xfffffd7ff6736000] [id=21]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000262400000

Registers:
RAX=0x0000000000000000, RBX=0xfffffd7ffe2323f0, RCX=0xfffffd7ffe22baa0, RDX=0x0000000000000018
RSP=0xfffffd7ff6735570, RBP=0xfffffd7ff6735580, RSI=0x0000000262400000, RDI=0xfffffd7ff6735670
R8 =0x0000000000411410, R9 =0xfffffd7ff67358a0, R10=0x0000000000000010, R11=0xfffffffffbc05970
R12=0xfffffd7ffe218a3c, R13=0x0000000262400000, R14=0xfffffd7ff6735670, R15=0x000000000041ed20
RIP=0xfffffd7ffd4044d4, RFLAGS=0x0000000000010282

  Top of Stack: (sp=0xfffffd7ff6735570)
0xfffffd7ff6735570:   fffffd7ff9f8e710 0000000000000005
0xfffffd7ff6735580:   fffffd7ff67355d0 fffffd7ffd5c8089
0xfffffd7ff6735590:   fffffd7ff67355b0 fffffd7ffe210694
0xfffffd7ff67355a0:   fffffd7ff6735670 00000000023d62b0
0xfffffd7ff67355b0:   0000000000000012 fffffd7ffe22b9c0
0xfffffd7ff67355c0:   000000000041eb90 fffffd7ff6735670
0xfffffd7ff67355d0:   fffffd7ff67355f0 fffffd7ffd5ed7f6
0xfffffd7ff67355e0:   000000000041e940 000000000041eb90
0xfffffd7ff67355f0:   fffffd7ff67356b0 fffffd7ffd5ebf7f
0xfffffd7ff6735600:   00000007ee25fbe0 00000000023d3b00
0xfffffd7ff6735610:   00000000023d6310 00000000023dbae8
0xfffffd7ff6735620:   fffffd7ff9ee69e8 fffffd7ff9ee6820
0xfffffd7ff6735630:   00000007f8602ec0 0000000000000001
0xfffffd7ff6735640:   fffffd7ff9ee69e8 fffffd7ffe22bcc0
0xfffffd7ff6735650:   0000000000000012 fffffd7ffe211150
0xfffffd7ff6735660:   000000000054bf50 0000000001754890
0xfffffd7ff6735670:   fffffd7ffe22baa0 fffffd7ffd4f0b00
0xfffffd7ff6735680:   000000000041e940 00000000023d62b0
0xfffffd7ff6735690:   0000000000000012 fffffd7ffe22b9c0
0xfffffd7ff67356a0:   000000000041e940 0000000000000000
0xfffffd7ff67356b0:   fffffd7ff6735960 fffffd7ffded17b9
0xfffffd7ff67356c0:   000000000046cb48 000000017e9a18d0
0xfffffd7ff67356d0:   0000000667e61d78 000000000041eb90
0xfffffd7ff67356e0:   000000000041ec30 0000000000421160
0xfffffd7ff67356f0:   000000000041ddf0 fffffd7ffe22a040
0xfffffd7ff6735700:   000000000054bf50 000000000046c760
0xfffffd7ff6735710:   000000000046c770 fffffd7ffe211150
0xfffffd7ff6735720:   000000017d9a18f0 0000000000000001
0xfffffd7ff6735730:   00000002a1ffffe0 00000000001ffffc
0xfffffd7ff6735740:   000000010000000a 000002ec4928353a
0xfffffd7ff6735750:   000002ec66f28949 0000000000000000
0xfffffd7ff6735760:   000000000041e9d0 0000000000000001

Instructions: (pc=0xfffffd7ffd4044d4)
0xfffffd7ffd4044b4:   20 d9 00 48 8b 00 48 03 98 90 00 00 00 c6 03 11
0xfffffd7ffd4044c4:   41 5c 5b c9 c3 00 00 00 55 48 8b ec 48 83 ec 10
0xfffffd7ffd4044d4:   8b 06 48 83 e0 03 48 83 f8 03 75 37 48 8b 05 01
0xfffffd7ffd4044e4:   fc d8 00 80 38 00 75 06 48 8b 46 08 eb 13 48 8b

Register to memory mapping:

RAX=0x0000000000000000 is an unknown value
RBX=0xfffffd7ffe2323f0: __1cIUniverseP_methodKlassObj_+0x78 in /usr/jdk/jdk1.7.0/jre/lib/amd64/server/libjvm.so at 0xfffffd7ffd050000
RCX=0xfffffd7ffe22baa0: __1cHnmethodG__vtbl_+0x5bd0 in /usr/jdk/jdk1.7.0/jre/lib/amd64/server/libjvm.so at 0xfffffd7ffd050000
RDX=0x0000000000000018 is an unknown value
RSP=0xfffffd7ff6735570 is an unknown value
RBP=0xfffffd7ff6735580 is an unknown value
RSI=0x0000000262400000 is an unknown value
RDI=0xfffffd7ff6735670 is an unknown value
R8 =0x0000000000411410 is an unknown value
R9 =0xfffffd7ff67358a0 is an unknown value
R10=0x0000000000000010 is an unknown value
R11=0xfffffffffbc05970 is an unknown value
R12=0xfffffd7ffe218a3c: __1cJCodeCacheF_heap_+0x13f3c in /usr/jdk/jdk1.7.0/jre/lib/amd64/server/libjvm.so at 0xfffffd7ffd050000
R13=0x0000000262400000 is an unknown value
R14=0xfffffd7ff6735670 is an unknown value
R15=0x000000000041ed20 is an unknown value


Stack: [0xfffffd7ff6636000,0xfffffd7ff6736000],  sp=0xfffffd7ff6735570,  free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3b44d4]  void PSPromotionFailedClosure::do_object(oopDesc*)+0x8
V  [libjvm.so+0x578089]  void MutableSpace::object_iterate(ObjectClosure*)+0x4d
V  [libjvm.so+0x59d7f6]  void PSYoungGen::object_iterate(ObjectClosure*)+0x1e
V  [libjvm.so+0x59bf7f]  void PSScavenge::clean_up_failed_promotion()+0xab
V  [libjvm.so+0xe817b9]  bool PSScavenge::invoke_no_policy()+0x1369
V  [libjvm.so+0xe802f7]  void PSScavenge::invoke()+0x3b
V  [libjvm.so+0xe48433]  HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long,bool)+0x97
V  [libjvm.so+0x49bd33]  void VM_ParallelGCFailedAllocation::doit()+0x93
V  [libjvm.so+0x49b795]  void VM_Operation::evaluate()+0x79
V  [libjvm.so+0x621dc8]  void VMThread::run()+0x548
V  [libjvm.so+0xe22fc9]  java_start+0x4a9
C  [libc.so.1+0x112b94]  _thrp_setup+0xbc
C  [libc.so.1+0x112e50]  _lwp_start+0x0

VM_Operation (0xfffffd7ff2816e50): ParallelGCFailedAllocation, mode: safepoint, requested by thread 0x00000000018cb000


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x000000000258c800 JavaThread "process reaper" daemon [_thread_in_native, id=1048, stack(0xfffffd7ff248b000,0xfffffd7ff24cb000)]
  0x00000000019bc000 JavaThread "process reaper" daemon [_thread_in_native, id=1047, stack(0xfffffd7ff2179000,0xfffffd7ff21b9000)]
  0x0000000002588800 JavaThread "Idle Thread" [_thread_blocked, id=1044, stack(0xfffffd7ff2138000,0xfffffd7ff2178000)]
  0x0000000007556000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0941.DelayBeforePingAttempts Test Ready" [_thread_blocked, id=1043, stack(0xfffffd7ff2652000,0xfffffd7ff2692000)]
  0x0000000003807000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0940.TimeRebootToPing Test Ready" [_thread_blocked, id=1042, stack(0xfffffd7ff223c000,0xfffffd7ff227c000)]
  0x000000000766c800 JavaThread "STATE_REDUCE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0932.Unmount7000Shares Test Archive" [_thread_blocked, id=1036, stack(0xfffffd7ff21ba000,0xfffffd7ff21fa000)]
  0x0000000001d50800 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0882.ManageSsh_ExecuteAkshScript OutputTransfer Seeking" [_thread_in_native, id=1035, stack(0xfffffd7ff21fb000,0xfffffd7ff223b000)]
  0x0000000003702800 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0882.ManageSsh_ExecuteAkshScript Test Command
            "ssh -t -o StrictHostKeyChecking=no root@auto7320-07"" [_thread_blocked, id=1033, stack(0xfffffd7ff23c8000,0xfffffd7ff2408000)]
  0x00000000071a8000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0948.Destroy7000EcmaClusterFS Test Ready" [_thread_blocked, id=1032, stack(0xfffffd7ff227d000,0xfffffd7ff22bd000)]
  0x000000000754f000 JavaThread "STATE_REDUCE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0931.Unmount7000Shares Test Archive" [_thread_blocked, id=1031, stack(0xfffffd7ff22be000,0xfffffd7ff22fe000)]
  0x0000000003708000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0956.Destroy7000EcmaClusterLun
Test Ready" [_thread_blocked, id=1029, stack(0xfffffd7ff2409000,0xfffffd7ff2449000)]
  0x00000000034cf800 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0955.Destroy7000EcmaClusterLun
Test Ready" [_thread_blocked, id=1028, stack(0xfffffd7ff244a000,0xfffffd7ff248a000)]
  0x00000000034d1000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0950.CheckOptions Test Ready" [_thread_blocked, id=1026, stack(0xfffffd7ff24cc000,0xfffffd7ff250c000)]
  0x0000000001dee000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0882.ManageSsh_ExecuteAkshScript OutputTransfer Seeking" [_thread_in_native, id=1025, stack(0xfffffd7ff258f000,0xfffffd7ff25cf000)]
  0x0000000007444800 JavaThread "STATE_REDUCE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0930.Unmount7000Shares Test Archive" [_thread_blocked, id=1024, stack(0xfffffd7ff25d0000,0xfffffd7ff2610000)]
  0x0000000001a55000 JavaThread "RMI TCP Connection(18)-10.80.205.160" daemon [_thread_blocked, id=1023, stack(0xfffffd7ff2611000,0xfffffd7ff2651000)]
  0x0000000001eee800 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/1109.CheckOptions Test Ready" [_thread_blocked, id=1021, stack(0xfffffd7ff2693000,0xfffffd7ff26d3000)]
  0x00000000036f1000 JavaThread "STATE_ACTIVE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/1101.CheckOptions Test Ready" [_thread_blocked, id=1020, stack(0xfffffd7ff26d4000,0xfffffd7ff2714000)]
  0x0000000002ef6000 JavaThread "STATE_REDUCE :Verify7000TestSuiteWrapper.111020.172333.auto4440-05.22967.10.9/0881.ManageSsh_VerifySshable Test Archive" [_thread_blocked, id=1019, stack(0xfffffd7ff2715000,0xfffffd7ff2755000)]
  0x00000000025c0800 JavaThread "Idle Thread" [_th


( This report has more than 16,000 characters and has been truncated. )

                                    

Comments
EVALUATION

http://hg.openjdk.java.net/hsx/hsx23/hotspot/rev/c1606f7a714c
                                     
2012-03-28
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/cc74fa5a91a9
                                     
2012-03-23



Hardware and Software, Engineered to Work Together