EVALUATION
The process is being terminated by signal 61, which is equivalent to the
real-time signal number 28 which is used by NIO for interrupting threads that
are performing operations at the same time another thread is trying to close the
socket (src/solaris/native/java/net/linux_close.c). It seems there may be a
conflict between NIO's usage of this signal and NPTL's usage of it; looking
through the strace output, more than one sigaction call for this signal number
is being made.
The reason this bug is showing up in the sharing workspace is that the failure
occurs with debug or fastdebug builds; a fastdebug build was provided to SQA for
testing. The failure with java_g began occurring in b08. It appears to have been
a change in the libraries, not the VM, that caused the test to begin failing.
The test passes with a debug build of the sharing VM in a b07 JDK.
###@###.### 2003-10-08
This bug has nothing in particular to do with NPTL. It's consistently
reproducible using java_g on a RH 7.2 system starting with Tiger b08.
The two invocations of sigaction which show up in the strace output are not a
surprise; one is in the java.net code (as mentioned above) and one is in
java.nio (sun/nio/ch/NativeThread). Only one of these is really needed, but
both are there out of paranoia and having both does no harm.
Note that you'll see slightly different RT signal numbers depending upon which
version of Linux you use. On RH 7.2 and RH AS 2.1 SIGRTMIN is 32, while on RH
9 it's 34.
I'm stumped as to why this only shows up with debug builds. The relevant
library code did not change in Tiger b08, so I suspect that a change in HotSpot
must have caused this. Reassigning to hotspot/runtime_system for further
evaluation.
-- ###@###.### 2003/11/16
--------------------------------------
b08 is the first promotion to use gcc-3.2. Older JDK was built by egcs-2.91.
There is also a gcc-3.2 build of b07 to help diagnose gcc issues.
Yes, with the same b07 source code, this test can pass without problem when
JDK is built by old gcc (/java/re/jdk/1.5/promoted/b07/binaries/linux-i586),
but it always fails with new gcc (/java/re/jdk/1.5/promoted/b07/binaries/
linux-i586.as21). So it does look like a gcc problem.
A known gcc issue since b08 is 4885046 (dso_handle is shared among dynamic
libraries), I will try that fix and see if it can also solve the problem here.
###@###.### 2003-11-16
--------------------------------------
Actually, the bug is in sun/nio/ch/NativeThread.c, it does not initialize
sa.sa_flags:
sigset_t ss;
struct sigaction sa, osa;
sa.sa_handler = nullHandler;
sigemptyset(&sa.sa_mask);
if (sigaction(INTERRUPT_SIGNAL, &sa, &osa) < 0)
JNU_ThrowIOExceptionWithLastError(env, "sigaction");
The sigaction() call will pick up whatever random value on the stack as
sa_flags. If it happens to have SA_ONESHOT (or SA_RESETHAND) set, the signal
handler will be reset to SIG_DFL when a INTERRUPT_SIGNAL is received. A
second interrupt signal will kill the Java process.
###@###.### 2003-11-17
|