Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4720694
Votes 4
Synopsis java apps crash on Solaris 9 Ultra-80 machine by using 1.4.1
Category hotspot:runtime_system
Reported Against 1.4.1 , hopper-rc
Release Fixed , 1.4.2(mantis-b20) (Bug ID:2056800) , 1.5(tiger-b05) (Bug ID:2056801)
State 11-Closed, Will Not Fix, bug
Priority: 2-High
Related Bugs 4674904 , 4695690 , 4778176 , 4784641 , 4927770 , 6432045 , 6511772
Submit Date 25-JUL-2002
Description
J2SE Version (please include all output from java -version flag):
  java version "1.4.1-rc"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-rc-b16)
  Java HotSpot(TM) Server VM (build 1.4.1-rc-b16, mixed mode)

Does this problem occur on J2SE 1.3 or 1.4?  Yes / No (pick one)
  Not sure. It is apparently a threading bug, so might be rare.

Operating System Configuration Information (be specific):  Solaris 9
Hardware Configuration Information (be specific):   4x450 Ultra-80
   Works fine on Ultra10 1-cpu machine.
Bug Description:  Crash with core file. Sometimes it just hang for 1K classes/threads.

Steps to Reproduce (be specific):
=================================
1) unzip concurrent.zip (Get util.concurrent package from http://gee.cs.oswego.edu - also concurrent.zip attached here).
2) cd concurrent
3) mkdir classes
4) javac -d classes *.java
5) cp misc/* classes/.
6) cd classes
7) javac *.java
8) mkdir EDU/oswego/cs/dl/util/concurrent/misc
9) mv *.class EDU/oswego/cs/dl/util/concurrent/misc/.
10) mv *.java EDU/oswego/cs/dl/util/concurrent/misc/.
11) mv *.html EDU/oswego/cs/dl/util/concurrent/misc/.
12) java -server -Xmx128m EDU.oswego.cs.dl.util.concurrent.misc.SynchronizationTimer
13) Above command should result in an application window launching. Below are steps user
    needs to execute to reproduce issue (also reflected in 'panel-operation.JPG'):
    NOTE: user must set path (PATH env) to a valid java executable variable before launching GUI.
14) In application GUI, click "no classes".
15) Click "waitfreeQueue"
16) Set "128k calls per thread" in combo box
17) Set "1M iterations per barrier" in combo box.
18) Click "start"
19) You should get follwoing hotspot error message: 

# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.1-rc-b16 mixed mode)
#
# Error ID: 53484152454432554E54494D450E435050014F 01
#
# Problematic Thread: prio=4 tid=0x5ab228 nid=0x8b4 runnable 
#

And a core file been generated.
Posted Date : 2005-12-15 00:01:27.0
Work Around
N/A
Evaluation
  xxxxx@xxxxx   2002-07-25

Here is the testing results for different machine configurations:

  * Ultra10 1-cpu, Solaris 8, 1024MB Memory
    - b16: OK
    - b17: OK

  * E3500, 6x400mz, Solaris 9, 3GB Memory:
    - b17: crash, core file generated
    - b16: crash, core file generated
    - 1.4: java.lang.OutOfMemoryError exception and hang

  * SunBlade, 2x750mhz, Solaris 8, 2GB Memory:
    - b16: Hang on 256 classes/threads
    - 1.4: hang on 256 classes/threads

  * Ultra80 4x450, Solaris 9, 4GB Memory:
    - 1.4: hang on 256 classes/threads
    - b17: crash, core file generated

  * Ultra80 4x450, Solaris 8, 4GB Memory:
    - b17: OK


Tested on U80 4x450,  Solaris 9 2gb mem
Failed/Hung with JDK 1.4.1_01
Passed with JDK 1.4.2-b07

Would like to close this out, please re-test with latest JDK 1.4.2-b07 or greater. 
Awaiting your feedback...
  xxxxx@xxxxx   2002-11-15



Will be closing this bug out by 2002-11-22

  xxxxx@xxxxx   2002-11-20 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  xxxxx@xxxxx   2002-12-09
I removed irrelevant comments regarding -Xcheck:jni.  The -Xcheck:jni
checking mistakenly rejects a null argument in IsSameObject used
by AWT.  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



I've run into another issue,
scharnhorst 116 =>go.local
Wed Nov 27 12:58:54 EST 2002
[error occured during error reporting]
#
# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2-beta-b08 mixed mode)
#
# Error ID: 53484152454432554E54494D450E435050014F 01
#
# Problematic Thread: prio=4 tid=0x007361a0 nid=0x3fe runnable 
#
Internal Error
Fatal: exception happened outside interpreter, nmethods and vtable stubs (1)

Do you want to debug the problem? 

-----------------------

The above failure mode is the same as found in 4778176 (now closed as a duplicate of this bug) and 4674904, which is no longer is reproducible.

It appears that when attempting to come to a safepoint, the function handle_illegal_instruction_exception is seeing a SEGV of its own because the ThreadCodeBuffer corresponding to the ThreadSafepointState is NULL. 

The ThreadCodeBuffer has been released by CompiledCodeSafepointHandler::setup because the call to reposition_and_resume_thread() failed.  Evidently, though, the thread appears to get restarted in the ThreadCodeBuffer...

This bug may be related to the fix for 4645393, since it was first reported after that putback.  Only a hunch as of yet, though.

  xxxxx@xxxxx   2003-01-17
---------------------

The smaller java program t4720694 (attached) demonstrates the bug.  It
is not an SQE quality test case, but the boiled down remnants of Doug
Lea's program.  It runs an infinite loop, but will eventually get an
assert that indicates the problem.

The assert is typically is in safepoint.cpp around line 467, but due
to the race condition nature of the bug, I have seen about 6 different
assertions fail.

I believe that this is a Solaris only bug.  It may have been exposed
with the fix for 4645393.

For best results, use a fastdebug build. Run with +SafepointALot,
CompileOnly=.take and -Xcomp. The compiler flags are necessary to
ensure that C2 uses an implict null check in the generated code for
take(). If one forces C2 to not use implicit checks with
-ImplicitNullChecks, the program runs quietly, and presumably,
correctly (forever).

  xxxxx@xxxxx   2003-03-21

---------------------

The race during safepointing goes something like this:

The VM thread starts the safepoint synchronize procedure.

The Java thread is in the midst of an implicit null pointer check.
That is, the C2 generated code has SEGV'ed. JVM_handle_solaris_signal
has selected the stub handler_for_null_exception_entry() and reset the
PC there. The thread stops.

The VM thread uses get_top_frame() to query the pc of Java thread.
The query does not report the stub address, but the address of the
instruction that SEGV'd.  The VM thread proceeds with moving the Java
thread to a compiled safepoint, eventually calling
reposition_and_resume_thread().

The Java thread is awakened by the callback and executes
SetThreadPC_Callback.  It can't validate the expected current pc, and
resumes the thread without altering the pc.

Now the race begins.

If the VM thread can destroy the ThreadCodeBuffer before the Java
thread gets very far, the program continues as expected.

However, if the Java thread proceeds in handling the implicit null
check, the shared runtime function compute_exception_return_address()
will direct the thread to continue processing in the ThreadCodeBuffer
right before it is destroyed. It in is this case that we fail.

In debug VMs this failure exhibits itself as an assertion. In
production VMs, the Java thread eventually takes a SIGILL executing in
the destroyed ThreadCodeBuffer.  The function
handle_illegal_instruction_exception() is called, but the
ThreadCodeBuffer is NULL, causing a second, fatal, signal.


  xxxxx@xxxxx   2003-03-21

Upon further review, 4695690 is probably a duplicate of this bug.

  xxxxx@xxxxx   2003-03-24

---------------------------------

Chuck is right. thread->safepoint_state()->_code_buffer is updated when
the code buffer is first allocated. At that time, we don't know if we can
reposition the thread or not. If we fail to reposition thread, we have
to resume thread and destroy the code buffer.

Before 4645393, thead is resumed after VM deletes the code buffer. In order
to fix 4645393, we have to resume thread at the same time we attempt to
reposition the therad. There is a chance that the thread is restarted before
VM deletes thread code buffer. Then the thread might see a non-null value of
thread code buffer first, but when it actually needs to access it, VM thread
has deleted the code buffer and reset _code_buffer to NULL, causing the
failures.

A possible fix is to update thread->safepoint_state()->_code_buffer only
when we know we can successfully reposition the thread. I verified the change
with t4720694b.java. Also it can fix 4695690 for 1.4.1_02.

Move this bug to runtime category, and assign it to myself.

  xxxxx@xxxxx   2003-03-26
Posted Date : 2005-12-15 00:01:27.0

The fix resolves the reported issue (meaning, the crash no longer occours with fix).  However, there still are hangs in 1.3.1.  The following jvm versions were used to test the fix:

$>java -version -version -server -Xmx128m EDU.oswego.cs.dl.util.concurrent.misc.SynchronizationTimer &
   java version "1.5.0_07"
   Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-b02)
   Java HotSpot(TM) Client VM (build 1.5.0_07-b02, mixed mode)

$>java -version -server -Xmx128m EDU.oswego.cs.dl.util.concurrent.misc.SynchronizationTimer &
    java version "1.3.1_18-internal"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_18-internal-wsmgr_14_mar_2006_15_00)
    Java HotSpot(TM) Client VM (build 1.3.1_18, mixed mode)

Here are the test results:
    * costume.sfbay (SunBlade 1000, 1 x 900 mhz, Solaris 9, 1 GB Memory):       
                + 1.5.0_07: finishes correctly.
                + 1.3.1_18: hangs at 256 threads.
    * tryout.sfbay (SunBlade 1000, 2 x 750 mhz, Solaris 8, 2 GB Memory):     
                + 1.5.0_07: finishes correctly.
                + 1.3.1_18: hangs at 256 threads.
    * somerset.sfbay (SunBlade 2500, 2 x 1600 mhz, Solaris 9, 2 GB Memory):
                + 1.5.0_07: finishes correctly.
                + 1.3.1_18: hangs at 256 threads.
    * producer.sfbay (E4500, 14 x 400 mhz, Solaris 8, 8GB Memory):      
                + 1.5.0_07: finishes correctly.
                + 1.3.1_18: hangs at 256 threads.
    * scoot.sfbay.sun.com (E4500, 14 x 400 mhz, Solaris 10, 4GB Memory):
                + 1.5.0_07: hangs indefinitely when it gets to last column (i.e. 1000 threads - please see attached image).
                + 1.3.1_18: hangs at 256 threads.
    * jcteu80x2.sfbay (Ultra80, 4 x 450 mhz, 2 GB Memory): - 
                + 1.5.0_07: hangs indefinitely (GUI grayed-out).
                + 1.3.1_18: sometimes hangs indefinitely (doesn't seem to be dependent on number of threads).

As I mentioned, no crash occoured on any of above machine - prstat output is available here:
    * prstat info for Tiger: 
          o /net/nightsvr/export3/jpse/regress.library/4720694B/tests/test_results/1.5.0u7
    * prstat info for 1.3.1:
          o /net/nightsvr/export3/jpse/regress.library/4720694B/tests/test_results/1.3.1
Posted Date : 2006-04-11 21:33:20.0

To clarify closing the bug ... 
The intent is to re-run the test case against current builds and open 
explicit C/R(bugs) against any failures, as this bug is misleading
(there are delivered fixes against 1.4.2 and 5.0 for the crash )
and yet the explicit bug is against 1.4.1 and there have been hangs reported
in some testing after the above fixes. Closeing in regards to the
1.3.1 aspect only.
Posted Date : 2006-12-01 15:08:03.0
Comments
  
  Include a link with my name & email   

Submitted On 23-APR-2004
dan.nazario
What is the fix for this bug?
It's not clear to me.  Thx


Submitted On 21-FEB-2005
raizerius
Seems like this bug is still unresolved in 1.4.2_02 b03. We got the bug in a production system running on Solaris 9. Below is the bug report :

 CORE3283: stderr: #
 CORE3283: stderr: # HotSpot Virtual Machine Error, Internal Error
 CORE3283: stderr: # Please report this error at
 CORE3283: stderr: # http://java.sun.com/cgi-bin/bugreport.cgi
 CORE3283: stderr: #
 CORE3283: stderr: # Java VM: Java HotSpot(TM) Server VM (1.4.2_02-b03 mixed mode)
 CORE3283: stderr: #
 CORE3283: stderr: # Error ID: 53484152454432554E54494D450E435050014F 01
 CORE3283: stderr: #
 CORE3283: stderr: # Problematic Thread: prio=5 tid=0x085a1f08 nid=0x151 runnable 
 CORE3283: stderr: #
 CORE3283: stderr: Heap at VM Abort:
 CORE3283: stderr: Heap
 CORE3283: stderr:  def new generation   total 169600K, used 122172K [0xd4000000, 0xdeaa0000, 0xdeaa0000)
 CORE3283: stderr:   eden space 164480K,  74% used [0xd4000000, 0xdb74f270, 0xde0a0000)
 CORE3283: stderr:   from space 5120K,   0% used [0xde5a0000, 0xde5a0000, 0xdeaa0000)
 CORE3283: stderr:   to   space 5120K,   0% used [0xde0a0000, 0xde0a0000, 0xde5a0000)
 CORE3283: stderr:  tenured generation   total 349568K, used 227972K [0xdeaa0000, 0xf4000000, 0xf4000000)
 CORE3283: stderr:    the space 349568K,  65% used [0xdeaa0000, 0xec9410d8, 0xec941200, 0xf4000000)
 CORE3283: stderr:  compacting perm gen  total 31744K, used 31588K [0xf4000000, 0xf5f00000, 0xf8000000)
 CORE3283: stderr:    the space 31744K,  99% used [0xf4000000, 0xf5ed9188, 0xf5ed9200, 0xf5f00000)


Submitted On 21-JUN-2006
I have same error.

AP Server 
@OS:RedHat Enterprice AS 3 Update 4
@Java:1.4.2.08
@AP Server(Oracle AS):Oracle10gAS 10.1.2.0.0
  EJDBC:10.1.0.4


------------------------------------------------------
#
# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2_08-b03 mixed mode)
#
# Error ID: 53484152454432554E54494D450E435050014F
#
# Problematic Thread: prio=1 tid=0x080c00b0 nid=0x7330 runnable 
#

Heap at VM Abort:
Heap
 def new generation   total 65536K, used 7404K [0x86c70000, 0x8b380000, 0x8b380000)
  eden space 58304K,   0% used [0x86c70000, 0x86c9b378, 0x8a560000)
  from space 7232K, 100% used [0x8ac70000, 0x8b380000, 0x8b380000)
  to   space 7232K,   0% used [0x8a560000, 0x8a560000, 0x8ac70000)
 tenured generation   total 582592K, used 448282K [0x8b380000, 0xaec70000, 0xaec70000)
   the space 582592K,  76% used [0x8b380000, 0xa69469e0, 0xa6946a00, 0xaec70000)
 compacting perm gen  total 38144K, used 37936K [0xaec70000, 0xb11b0000, 0xb2c70000)
   the space 38144K,  99% used [0xaec70000, 0xb117c1c8, 0xb117c200, 0xb11b0000)



PLEASE NOTE: JDK6 is formerly known as Project Mustang