|
Quick Lists
|
|
Bug ID:
|
4386948
|
|
Votes
|
135
|
|
Synopsis
|
libthread panic: fault in libthread critical section (PID: 6700 LWP 1)
|
|
Category
|
hotspot:other
|
|
Reported Against
|
1.1
, 1.3
, 1.3.1
, kest-sol-fcs
|
|
Release Fixed
|
|
|
State
|
11-Closed, duplicate of 4499510,
bug
|
|
Priority:
|
4-Low
|
|
Related Bugs
|
4499510
|
|
Submit Date
|
08-NOV-2000
|
|
Description
|
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)
In a scheduling system for multiple parallel system jobs the following problem
occurrs erratically:
signal fault in critical section
signal number: 11, signal code: 1, fault address:
0xfee0be20, pc: 0xff36c6e8, s
p: 0xf3f80b60
libthread panic: fault in libthread critical section (PID: 6700 LWP 1)
stacktrace:
ff36c6cc
ff36c518
fe5db94c
fe604534
fe562170
ff044ec8
ff047ec4
6fce8
6ceb0
fe7a4994
fe549d3c
fe5499e8
fe551aec
fe5522ac
ff043c98
ff03f56c
6fce8
6ceb0
6ceb0
6ceb0
fe7a4994
fe549d3c
fe5493d4
fe549444
fe57a2a8
fe651a10
fe5ed2e8
ff37bd04
fe5ed2c8
This is a non-reproducible event, ie it occurs once during a runtime of several
days with several thousand jobs started successfully. All required and
recommended patches are installed on the machine. No exceptions are being thrown
and the VM continues to run.
The Java code causing this event contains a simple call to Runtime.exec, called
in a separate Thread:
private class JobThread extends Thread {
private Job job;
private String exec;
private File dir;
JobThread(Job job, String exec, File dir) {
this.job = job;
this.exec = exec;
this.dir = dir;
}
public void run() {
this.job.startTime = new Date();
try {
Process process = Runtime.getRuntime().exec(exec, null, dir);
job.setProcess(process);
process.waitFor();
job.collectProcess();
}
catch(IOException ioe) {
System.err.println("IO Exception on exec");
ioe.printStackTrace();
}
catch(InterruptedException ine) {
System.err.println("Interrupted Exception on exec");
ine.printStackTrace();
}
}
}
The panic event occurs on calling the start() method of above Thread.
(Review ID: 111640)
======================================================================
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)
This is the error log of what's been happening with Brazil that's been running on the machine pn21.eng. We've been flooding it with lots of HTTP requests from other machines connected to it via fiber.
However, a more interesting bug resently showed up. I started testing Brazil running CGI applications. This one is a simple one, a PERL script that just dumps the environment variables. You can see the page at:
http://pn21.eng/cgi/cgi_test.cgi
And the source is included.
Below the Java exceptions you can see there is a libthread panic. This did not show up until I added the CGI test today.
Thought everyone should know.
Server started on 80
Setting server to run as user: nobody
Created new table: 1/6
Created new table: 2/6
Created new table: 3/6
Created new table: 4/6
Created new table: 5/6
signal fault in critical section
signal number: 11, signal code: 1, fault address:
0x7be01f50, pc: 0xff3638ec, sp: 0x79e00978
libthread panic: fault in libthread critical section : dumping core (PID: 11472
LWP 1)
stacktrace:
ff3638d8
ff3639b4
ff367adc
fe7d7e5c
fb0a9e4c
fb0aa49c
fe7a4994
fe549d3c
fe5499e8
fe551aec
fe5522ac
fe7d3c98
fe7cf56c
11f8b0
fb0a3de8
11ca78
fb09e684
fb037874
fb04207c
fb037874
fb070940
fb0606fc
fb07e758
fe7a4994
fe549d3c
fe5493d4
fe549444
fe57a2a8
fe651a10
fe5ed2e8
ff36bb34
fe5ed2c8
(Review ID: 118521)
======================================================================
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)
I'm developing a java program whose purpose in life is to execute a bunch of programs, some which can be run in parallel and some which must be run sequentially. Most of the programs to be executed are perl scripts. The java program is pretty simple - it creates a thread for each group of tasks that can be run concurrently, and within the thread, it exec's the program to be run with /bin/sh -c "program args...". It then reads the stdout and stderr of the exec'ed program while it is alive, and when the program terminates, it goes onto the next task.
The exec'ed processes crash sometimes. An example of the output produced is included below. A core file is produced, but the core file is not from /bin/sh or perl. Rather, it is a java core file. My guess is that Runtime.exec is implemented by calling fork(), and then calling exec(). When fork() is called, all the threads in the parent process are recreated in the child process, and sometimes one of those threads gets a few time slices before exec() is called. The state of the new process is not such that the other threads can run appropriately, and once in a while they cause java to crash.
(Review ID: 128353)
======================================================================
|
|
Work Around
|
None.
======================================================================
xxxxx@xxxxx 2002-01-02
This is a bug with the Solaris libthread library. There workaround is to
use the new "t2" libthread library.
If you are running on Solaris 9, this is the default libthread and you don't
need to do anything to use it.
If you are running on Solaris 8, this is available in /usr/lib/lwp
(you should check for latest patches to this at the time you need it).
To use this with JDK 1.4 or 1.3.1_02 (or later) you need to do the following:
1)set LD_LIBRARY_PATH /usr/lib/lwp:$LD_LIBRARY_PATH
2) on the java command line add the following option
-XX:+OverrideDefaultLibthread
As super-user you can run /usr/bin/pldd "pid", to check that the java process
"pid" is using the correct libthread.
|
|
Evaluation
|
This is a Solaris libthread bug. Closing this as a duplicate of 4499510,
although there are not enough details in this bug to do an exact match.
Perhaps the PERL script mentioned here will help duplicate 4499510.
|
|
Comments
|
Submitted On 31-JAN-2001
timleecasey
I seem to have this similar problem, with similar symtoms.
I have seen this only once.
Here is my text from my internal bug base:
Thu Apr 20 14:32:56 PDT
2000: /export/home/rcenter27/vault/bin/vault stop
Thu Apr 20 14:32:57 PDT
2000: /export/home/rcenter27/vault/bin/vault start
signal fault in critical section
signal number: 10, signal code: 3,
fault address: 0xee175de
8, pc: 0xef77007c, sp: 0xee395b38
libthread panic: fault in libthread critical section (PID:
23058 LWP 64)
stacktrace:
ef770048
ef76fe0c
ef768934
ef636a10
ef768a78
0
Submitted On 28-FEB-2001
sreilly
I recently had almost the same problem, and it is caused by spawning too many threads. The solution is to use a thread pool and reuse threads to avoid abusing the system. I had a thread pool before, but it was malfunctioning. Fixing it resolved the problem completely.
Submitted On 05-APR-2001
cvanover
Similar phenomenon experienced both on Sparc Solaris 7 and
on Intel Solaris 8 with a screen dump of the latter;
it's a standalone JAVA application using JNI and we are
getting regular crashes with coredumps;
however the libthread panic is not accompanied with a core:
myprogram -playback my_select200
java version "1.3.1-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build
1.3.1-beta-b15)
Java HotSpot(TM) Client VM (build 1.3.1beta-b15, mixed mode)
Loading libraries for SunOS
signal fault in critical section
signal number: 11, signal code: 1,
fault address: 0x20c, pc: 0xdfb68e66, sp: 0xdf605b5c
libthread panic: fault in libthread critical section (PID:
4922 LWP 4)
stacktrace:
ffffffff
dfb6f970
dfb6f71c
df96cabb
df96ca3a
df96c9d0
df96f1a1
df9889f1
df96be9c
df96bcfa
df92c2f3
df92c12e
df92c069
dfb677fc
Submitted On 24-JUL-2001
MakarandKashikar
We have the same problem, our Java process is a server
class process and sometimes it gives this libthread
exception, but continues. Sometimes, it gives Segmentation
Fault-Core dumped error and crashes. I was lookingfor some
possible Solaris patches for this, but could not find any..?
Are we erally stuck with this until JDK1.4 is released in
Q4 of 2001. ?
Submitted On 15-AUG-2001
JThoennes
WORKAROUND:
We experienced this problems with lots of native and Java
thread and currently researching this case with SUN support.
We can consistently reproduce the problems and already
generated
some nice traces and core dumps.
Meanwhile we had some scheduling problems with JDK 1.2.2:
Some runnable thread did not get bound to a LWP. As a
workaround we tried
to include /usr/lib/lwp in the head of the LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH
In this way, an alternative libthread.so gets loaded, which
does no sophisticated mapping of many threads to a couple of
LWPs (N:M model) but simply uses 1:1 (every thread gets a
LWP). This introduces some overhead, since lots of LWPs are
created.
BUT:
This solved both the scheduling problems and
the libthread panic bug.
Could anybody give me some feedback whether this workaround
also works for the other libthread panics?
Submitted On 18-FEB-2002
AWADA
Cannot access to bug report 4499510
Submitted On 11-DEC-2002
monosun
I want to see 4499510 bug report.
4499510 link is broken.
Submitted On 15-DEC-2002
krishnamoorthyk
I want to see the status of 4499510.
Why is it blocked?
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |