Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4386948
Votes 135
Synopsis libthread panic: fault in libthread critical section (PID: 6700 LWP 1)
Category hotspot:other
Reported Against 1.1 , 1.3 , 1.3.1 , kest-sol-fcs
Release Fixed
State 11-Closed, duplicate of 4499510, bug
Priority: 4-Low
Related Bugs 4499510
Submit Date 08-NOV-2000
Description




java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)


In a scheduling system for multiple parallel system jobs the following problem
occurrs erratically:

signal fault in critical section
signal number: 11, signal code: 1,                          fault address:
0xfee0be20, pc: 0xff36c6e8, s
p: 0xf3f80b60
libthread panic: fault in libthread critical section (PID: 6700 LWP 1)
stacktrace:
        ff36c6cc
        ff36c518
        fe5db94c
        fe604534
        fe562170
        ff044ec8
        ff047ec4
        6fce8
        6ceb0
        fe7a4994
        fe549d3c
        fe5499e8
        fe551aec
        fe5522ac
        ff043c98
        ff03f56c
        6fce8
        6ceb0
        6ceb0
        6ceb0
        fe7a4994
        fe549d3c
        fe5493d4
        fe549444
        fe57a2a8
        fe651a10
        fe5ed2e8
        ff37bd04
        fe5ed2c8

This is a non-reproducible event, ie it occurs once during a runtime of several
days with several thousand jobs started successfully. All required and
recommended patches are installed on the machine. No exceptions are being thrown
and the VM continues to run.

The Java code causing this event contains a simple call to Runtime.exec, called
in a separate Thread:

 	private class JobThread extends Thread {
 		private Job job;
 		private String exec;
 		private File dir;
 		
 		JobThread(Job job, String exec, File dir) {
 			this.job = job;
 			this.exec = exec;
 			this.dir = dir;
 		}
 		
 		public void run() {
		    this.job.startTime = new Date();
 			try {
 				Process process = Runtime.getRuntime().exec(exec, null, dir);
 				job.setProcess(process);
 				process.waitFor();
 				job.collectProcess();
 			}
 			catch(IOException ioe) {
		    System.err.println("IO Exception on exec");
		    ioe.printStackTrace();
 			}
 			catch(InterruptedException ine) {
		    	System.err.println("Interrupted Exception on exec");
		    	ine.printStackTrace();
 			}
 		}
 	}

The panic event occurs on calling the start() method of above Thread.
(Review ID: 111640) 
======================================================================




java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)


This is the error log of what's been happening with Brazil that's been running on the machine pn21.eng. We've been flooding it with lots of HTTP requests from other machines connected to it via fiber.
  
 
However, a more interesting bug resently showed up. I started testing Brazil running CGI applications. This one is a simple one, a PERL script that just dumps the environment variables. You can see the page at:
  
  http://pn21.eng/cgi/cgi_test.cgi
  
  And the source is included.
  
  Below the Java exceptions you can see there is a libthread panic. This did not show up until I added the CGI test today.
  
  Thought everyone should know.
  
Server started on 80
Setting server to run as user: nobody
 Created new table: 1/6
 Created new table: 2/6
 Created new table: 3/6
 Created new table: 4/6
 Created new table: 5/6
signal fault in critical section
signal number: 11, signal code: 1,     fault address:
0x7be01f50, pc: 0xff3638ec, sp: 0x79e00978
libthread panic: fault in libthread critical section : dumping core (PID: 11472
LWP 1)
stacktrace:
	ff3638d8
	ff3639b4
	ff367adc
	fe7d7e5c
	fb0a9e4c
	fb0aa49c
	fe7a4994
	fe549d3c
	fe5499e8
	fe551aec
	fe5522ac
	fe7d3c98
	fe7cf56c
	11f8b0
	fb0a3de8
	11ca78
	fb09e684
	fb037874
	fb04207c
	fb037874
	fb070940
	fb0606fc
	fb07e758
	fe7a4994
	fe549d3c
	fe5493d4
	fe549444
	fe57a2a8
	fe651a10
	fe5ed2e8
	ff36bb34
	fe5ed2c8
(Review ID: 118521)
======================================================================




java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)


I'm developing a java program whose purpose in life is to execute a bunch of programs, some which can be run in parallel and some which must be run sequentially.  Most of the programs to be executed are perl scripts.  The java program is pretty simple - it creates a thread for each group of tasks that can be run concurrently, and within the thread, it exec's the program to be run with /bin/sh -c "program args...".  It then reads the stdout and stderr of the exec'ed program while it is alive, and when the program terminates, it goes onto the next task.

The exec'ed processes crash sometimes. An example of the output produced is included below.  A core file is produced, but the core file is not from /bin/sh or perl.  Rather, it is a java core file.  My guess is that Runtime.exec is implemented by calling fork(), and then calling exec().  When fork() is called, all the threads in the parent process are recreated in the child process, and sometimes one of those threads gets a few time slices before exec() is called. The state of the new process is not such that the other threads can run appropriately, and once in a while they cause java to crash.

(Review ID: 128353)
======================================================================
Work Around




None.
======================================================================

  xxxxx@xxxxx   2002-01-02

This is a bug with the Solaris libthread library. There workaround is to
use the new "t2" libthread library.

If you are running on Solaris 9, this is the default libthread and you don't
need to do anything to use it.

If you are running on Solaris 8, this is available in /usr/lib/lwp
(you should check for latest patches to this at the time you need it).
To use this with JDK 1.4 or 1.3.1_02 (or later) you need to do the following:

1)set  LD_LIBRARY_PATH /usr/lib/lwp:$LD_LIBRARY_PATH
2) on the java command line add the following option
  -XX:+OverrideDefaultLibthread

As super-user you can run /usr/bin/pldd "pid", to check that the java process
"pid" is using the correct libthread.
Evaluation
This is a Solaris libthread bug. Closing this as a duplicate of 4499510,
although there are not enough details in this bug to do an exact match.
Perhaps the PERL script mentioned here will help duplicate 4499510.
Comments
  
  Include a link with my name & email   

Submitted On 31-JAN-2001
timleecasey

I seem to have this similar problem, with similar symtoms.
I have seen this only once.

Here is my text from my internal bug base:

Thu Apr 20 14:32:56 PDT 
2000:  /export/home/rcenter27/vault/bin/vault stop
Thu Apr 20 14:32:57 PDT 
2000:  /export/home/rcenter27/vault/bin/vault start
signal fault in critical section
signal number: 10, signal code: 3,                      
fault address: 0xee175de
8, pc: 0xef77007c, sp: 0xee395b38
libthread panic: fault in libthread critical section (PID: 
23058 LWP 64)
stacktrace:
        ef770048
        ef76fe0c
        ef768934
        ef636a10
        ef768a78
        0


Submitted On 28-FEB-2001
sreilly
I recently had almost the same problem, and it is caused by spawning too many threads.  The solution is to use a thread pool and reuse threads to avoid abusing the system.  I had a thread pool before, but it was malfunctioning.  Fixing it resolved the problem completely.


Submitted On 05-APR-2001
cvanover
Similar phenomenon experienced both on Sparc Solaris 7 and 
on Intel Solaris 8 with a screen dump of the latter;
it's a standalone JAVA application using JNI and we are 
getting regular crashes with coredumps;
however the libthread panic is not accompanied with a core:

myprogram -playback my_select200

java version "1.3.1-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 
1.3.1-beta-b15)
Java HotSpot(TM) Client VM (build 1.3.1beta-b15, mixed mode)

Loading libraries for SunOS

signal fault in critical section
signal number: 11, signal code: 1,                          
fault address: 0x20c, pc: 0xdfb68e66, sp: 0xdf605b5c
libthread panic: fault in libthread critical section (PID: 
4922 LWP 4)
stacktrace:
        ffffffff
        dfb6f970
        dfb6f71c
        df96cabb
        df96ca3a
        df96c9d0
        df96f1a1
        df9889f1
        df96be9c
        df96bcfa
        df92c2f3
        df92c12e
        df92c069
        dfb677fc


Submitted On 24-JUL-2001
MakarandKashikar
We have the same problem, our Java process is a server 
class process and sometimes it gives this libthread 
exception, but continues. Sometimes, it gives Segmentation 
Fault-Core dumped error and crashes. I was lookingfor some 
possible Solaris patches for this, but could not find any..?
Are we erally stuck with this until JDK1.4 is released in 
Q4 of 2001. ?


Submitted On 15-AUG-2001
JThoennes
WORKAROUND:

We experienced this problems with lots of native and Java
thread and currently researching this case with SUN support.
We can consistently reproduce the problems and already
generated
some nice traces and core dumps.
Meanwhile we had some scheduling problems with JDK 1.2.2:
Some runnable thread did not get bound to a LWP. As a
workaround we tried
to include /usr/lib/lwp in the head of the LD_LIBRARY_PATH:

	export LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH

In this way, an alternative libthread.so gets loaded, which
does no sophisticated mapping of many threads to a couple of
LWPs (N:M model) but simply uses 1:1 (every thread gets a
LWP). This introduces some overhead, since lots of LWPs are
created.
BUT:
	This solved both the scheduling problems and
	the libthread panic bug.

Could anybody give me some feedback whether this workaround
also works for the other libthread panics?


Submitted On 18-FEB-2002
AWADA
Cannot access to bug report 4499510


Submitted On 11-DEC-2002
monosun
I want to see 4499510 bug report.
4499510 link is broken.


Submitted On 15-DEC-2002
krishnamoorthyk
I want to see the status of 4499510.
Why is it blocked?



PLEASE NOTE: JDK6 is formerly known as Project Mustang