Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4466587
Votes 3
Synopsis JVM causes segmentation fault on Mandrake 8.0, SuSE 7.2
Category java:runtime
Reported Against 1.3.1 , merlin-beta
Release Fixed 1.4(merlin-beta2)
State 10-Fix Delivered, bug
Priority: 4-Low
Related Bugs 4441425 , 4484289 , 4483183 , 4503122
Submit Date 06-JUN-2001
Description




java version "1.4.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)

This is a resubmission of bug #124860.

All the following tests were done on two linux boxes:

1) Red Hat Linux 7.0 (Guinness) using kernel 2.2.16-22 runing on a P-III 733MHz.
 GNOME 1.2 is the desktop environment.  This was installed about four months ago
and has run in a very stable since.

2) Linux Mandrake 8.0 (Traktopel) using kernel 2.4.3-20mdk #1 running on an
Athlon-C 1333MHz. GNOME 1.4 is the desktop environment.  This install is
relativly fresh (the only non-standard packages loaded are J2SDK1.4.0, pine, and
 xxxxx@xxxxx ).

I have not tried to replicate the bug on any other distribution or system.  The
bug does not appear with JDK 1.2.2 or the IBM JDK 1.3.0; I do not know about
other JDK's since these where the only two on hand at the time.

The following code snipped incorrectly produces a segmentation fault insteam of
a StackOverflowError:

public class SegFaultTest
{

    public static void main(String[] args)
    {
        new SegFaultTest();
    }

    public SegFaultTest()
    {
        new SegFaultTest();
    }

}

It was compiled and run with:

javac SegFaultTest.java
java SegFaultTest

And produced:
Segmentation fault

As noted previously, this also occurs under other conditions, such as a
NullPointerException.  It seems to occur when there are deeply recursive calls
during construction.  I have managed to construct another code snipped that
demonstrates this situation:

public class SegFaultTestNpe
{

    public static void main(String[] args)
    {
        new SegFaultTestNpe(Integer.parseInt(args[0]));
    }

    public SegFaultTestNpe(int depth)
    {
        if (depth == 0)
        {
            ((String)null).length();
        }
        else
        {
            new SegFaultTestNpe(depth - 1);
        }
    }

}

It was compiled and run with:

javac SegFaultTestNpe.java
java SegFaultTestNpe <depth>

With the value for <depth> between 0 and 423 inclusive, it generates the
expected NullPointerException and stack trace.  With a value of 424 however, a
segmentation fault is generated.

Also, this is not restricted to occuring in constructors as previously thought.
 It also occurs with exceptions thrown during normal recursive code, such as:

public class SegFaultTestNpeNc
{

    public static void main(String[] args)
    {
        SegFaultTestNpeNc sf = new SegFaultTestNpeNc();
        sf.recurse(Integer.parseInt(args[0]));
    }

    public SegFaultTestNpeNc()
    {
    }

    public void recurse(int depth)
    {
        if (depth == 0)
        {
            ((String)null).length();
        }
        else
        {
            recurse(depth - 1);
        }
    }

}

I can probably construct further examples based around the same idea.  When you
actually write correct recursive algorithms, they provide the correct result.
The problem only occurs when there is an exception thrown deep in a recursive
call.  The most worrying thing about this bug is that there are situations that
an exception is thrown and should be handled in deep recursion.  Simply replace
the line "((String)null).length();" in the above snipped with "throw new
IOException();", (makeing the corresponding changes to the method declarations
as well) and the same error will occur.

If this bug is something peculiar to both these systems listed, it is probably a
very good idea to try and track down what exactly is causing this problem as it
seems to be more common than on one linux install on one type of machine using a
specific kernel version.  The only idea that I can offer is that because the
stack trace itself is so long, it may be longer than some internal, undocumented
limit and this is causing an overrun in native code somewhere.
(Review ID: 125179) 
======================================================================
Work Around
Reduce the default stack size. At bash shell, do "ulimit -s 2048";
use "limit stacksize 2048" for tcsh.

 xxxxx@xxxxx  2001-06-26
Evaluation
The problem is not reproducible on Redhat 6.1 or Redhat 7.1. I haven't
tried it on Redhat 6.2, but it should be OK. The testcase only crashes
on Redhat 7.0. The stack layout on RH 7.0 is slightly different from
that on RH 6.x or 7.1. That might be the cause.

 xxxxx@xxxxx  2001-06-12

The thread_self() implementation in glibc 2.2.x cannot handle thread 
stack larger than 6M correctly if glibc is not compiled with the flag
"--enable-kernel=2.4.0", as is the case for Redhat 7.0, SuSE 7.2
and Debian Linux.

Normally pthread will enforce 2M maximum stack size when it creates
a new thread, but the initial thread is created by Linux kernel and
its size determined by "ulimit -s". pthread library has no control
over the initial thread stack size. If it's larger than 6M (most 
platforms use 8M or "unlimited" default size for the initial thread), 
glibc/pthread will crash once the current stack size exceeds 6M,
or when a signal that requires alternate signal stack (e.g. SIGSEGV) 
is sent to the thread.

I'm not sure if the problem will be fixed in glibc, since the
problemetic code is probably obsolete. The fix in VM is to limit 
the maximum stack size for initial thread to be 2M.

 xxxxx@xxxxx  2001-06-26

The reason that this problem does not reproduce on Redhat 6.x is
pthread in glibc-2.1.x will setrlimit() during initialization and effectively
limit the initial thread stack size to under 2M. glibc-2.2.x does not call
setrlimit(), probably because it can handle large stack in "floating stack"
mode (i686 version). But for "fixed stack" mode (i386 version of glibc-2.2.x),
this is a bug. It is especially harmful to VM, because VM will put alternate
signal stack at the lower end of thread stack. If the initial thread stack
size is larger than 6M, pthread library will think the alternate signal
stack of initial thread belongs to a different thread. This will cause 
wrong thread pointer being retrieved from thread local storage and
crash in the pthread library.

Fixed in Merlin beta-refresh by limiting max stack size for initial
thread to 2M. Changed bug synopsis to reflect the nature of the crash.

 xxxxx@xxxxx  2001-06-27

It is highly recommended to limit the max thread stack size under 2M.
There is probably legacy code in the pthread library or user code
that still assumes the old 2M fixed thread stack.

 xxxxx@xxxxx  2001-07-11

Verified on Mandrake 8.0 that this is fixed in 1.4 beta3 and that 1.3.1 did crash on Mandrake 8. Need to update release notes


 xxxxx@xxxxx  2001-09-27
Comments
  
  Include a link with my name & email   

Submitted On 19-JUL-2001
hsiddiqu
This bug report and 
The java-1.3.1 installation notes for Linux

http://java.sun.com/j2se/1.3/install-linux-sdk.html

and the above discussion suggest that the problem is due to
glibc-2.2.x libraries.

However, I'm using RH Linux 7.0 (Guinness) and libc-2.1.92
and get a seg fault after compiling/running the SegFaultTest
class mentioned earlier. Again, the fault disappears if I
use the suggested workaround of ulimit -s 2048.


Submitted On 21-JUL-2001
huanghui1
But isn't "libc-2.1.92" a beta release of glibc-2.2?
The last glibc-2.1 release is 2.1.3.


Submitted On 18-AUG-2001
bojans
The latest glibc from Rawhide (glibc-2.2.4-5.i386.rpm,
glibc-common-2.2.4-5.i386.rpm etc.) seems to be fixing the
problem quite fine on RedHat 7.0, custom compiled kernel
2.4.9. Most other programs seem to be working with this
glibc as well (so far :-)

WARNING: Upgrading glibc on your system my break any number
of other programs. USE AT YOUR OWN RISK!!!


Submitted On 27-AUG-2001
aranganath
I'm running RedHat 7.1 Server install.  I've installed the 
compat-libstdc++ libraries for the server isntall, and i am 
still getting this segfault.  the ulimit workaround solves 
it, but i still find it curious that this happens under 
7.1.  any ideas what the problem might be?


Submitted On 21-SEP-2001
huanghui1
Try "echo $LD_ASSUME_KERNEL"
If it says "2.2.x", then you need the "ulimit" hack.

Basically, RH-7.1 comes with two versions of libpthread,
i386 and i686. The i686 version (in /lib/i686) makes
use of 2.4 kernel features and has floating stack support.
The i386 version has the bug we mentioned in evaluation.

By default, Redhat 7.1 uses i686 version of libpthread, so
we say the bug does not affect Redhat 7.1. However, if you
set LD_ASSUME_KERNEL to 2.2.5 (I understand that's what
Redhat recommends in README), then the i386 version will
be loaded instead. This version has the bug. You need to
either limit stacksize or unset LD_ASSUME_KERNEL.


Submitted On 05-OCT-2001
ZhichaoH
I have tried jdk1.3.1-01 on RedHat7.1 with 2.4.3 kernel. 
The JVM crashed
when I tried to run xalan samples.  Use java -classic or
ulimit -s 2048 
helps solved the problem.  2001-10-4


Submitted On 07-OCT-2001
ebourgeois
I tried the SegFaultTest above on RedHat7.0 and glibc 2.2-12
and I still receive the segmentation fault. However, if I
use the ulimit hack, I am thrown a NullPointerException. I
don't really feel like upgrading to RedHat7.1 for this. Has
anyone else in the same boat?


Submitted On 11-OCT-2001
22111978
This problem also occures when building MIDP 1.03 On a 
RedHat 7.0 Box (Guinness) when using JDK 1.3.1_01.
The JavaCodeCompact tool crashes with a Seg. Fault when 
generating ROMjavaUnix.c
Using -classis causes the JVM immediatly to crash and 
dumping core. Using the "ulimit" hack works fine.


Submitted On 17-OCT-2001
schamp
just to let Debian users know: this doesn't appear to be a
problem  on Debian/'woody' systems. 

with Debian/woody, default libstdc, and kernel 2.4.9,
using Sun's JRE 1.4.0-beta2 :

SegFaultTest.java : produced the expected stack-overflow
errors

SegFaultTestNpe and SegFaultTestNpeNc : each produced an
ArrayIndexOutOfBoundsException

no segfaults.


Submitted On 06-NOV-2001
frankz99
For me, java -classic works but even ulimit doesn't fix 
me.  I'm on kernel 2.2.19 with glibc 2.1.3.  What else 
should I look for??


Submitted On 18-NOV-2001
ypokkine
I have Mandrake 8.1 ,it worked fine in the beginning but
after I tried to compile diver.java it crashed.
ulimit -s 2048 was not in use at the time


Submitted On 07-JAN-2002
b0nb0n0v
I'm using Slackware 8.0 and I also have runtime problem near
to this.
./javac: error while loading shared libraries: cannot open
shared object file: cannot load shared object file: No such
file or directory


Submitted On 27-JAN-2002
cgh_sun
I'm using Debian Woody with some packages from unstable 
and glibc 2.2.4. JDK 1.4 Release Candidate immediately 
segfaults with any invocation of any command line tool. 
JDK 1.3.1 works with the classic VM ("java -classic"). 
ulimit -s 2048 does NOT solve any problems. IBM's JDK 
works fine, but is incompatible with Sun's reference 
implementation of the J2EE server. I'm uncertain about 
breaking stuff by downgrading my version of glibc -- any 
suggestions?


Submitted On 08-FEB-2002
skeet5
skeet5 can i play? I am tired of all the sign-up pages & regisitering &entering info & and can only read pages I own almost every game book there is. very frustrated!11 please help?skeet5@earthlink.net


Submitted On 05-NOV-2003
wfvoogd
schamp: 
are you sure you used the -server option with mem settings,
because on my woody it did occur (build 1.4.0_01-b03) ulimit
-s 2048 did help, but not all the way, i got one out of my
tomcat yesterday :( Still looking for further opts. reading
this it might seem a good idea to look for
NullPointerExceptions catches in the code and replace them
by a pre check?



PLEASE NOTE: JDK6 is formerly known as Project Mustang