|
Quick Lists
|
|
Bug ID:
|
6671051
|
|
Votes
|
0
|
|
Synopsis
|
(process) Runtime.exec() hangs if signalled during fork/exec
|
|
Category
|
java:classes_lang
|
|
Reported Against
|
|
|
Release Fixed
|
7(b25),
6-open(b07) (Bug ID:2159936)
, 6u7(b01) (Bug ID:2160173)
|
|
State
|
10-Fix Delivered,
bug
|
|
Priority:
|
2-High
|
|
Related Bugs
|
6474073
,
6667180
,
6678286
,
4811767
,
4052517
|
|
Submit Date
|
04-MAR-2008
|
|
Description
|
We are seeing threads hung in java.lang.Runtime.exec() on Solaris 10. The child process is created, but the Runtime.exec() call never returns.
In this example the calling thread is 103(0x67) and the child pid is 14159(0x374f) - I didn't capture the output that shows that.
Here is the backtrace (from jstack).
"CacheManagerTaskExecutor-15" prio=10 tid=0x0840d400 nid=0x67 sleeping[0xb48a7000..0xb48a7b70]
java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:431)
at java.lang.Runtime.exec(Runtime.java:328)
at com.sun.hss.domain.util.misc.ExpectCommandExecutor.execute(ExpectCommandExecutor.java:48)
- locked <0xbc1e72d0> (a com.sun.hss.domain.util.misc.ExpectCommandExecutor)
at com.sun.hss.domain.util.ipmi.IPMICommand.exec(IPMICommand.java:168)
at com.sun.hss.domain.util.ipmi.IPMICommand.exec(IPMICommand.java:145)
at com.sun.hss.domain.driver.generic.GenericServerMXBeanHelper.execIPMICommand(GenericServerMXBeanHelper.java:406)
at com.sun.hss.domain.util.ipmi.AbstractIPMICommandHelper.getFruMap(AbstractIPMICommandHelper.java:279)
at com.sun.hss.domain.driver.v20z.V20zServerMXBeanHelper.getSerialNumber(V20zServerMXBeanHelper.java:211)
at com.sun.hss.domain.driver.generic.AbstractServerMXBean.getSerialNumber(AbstractServerMXBean.java:87)
at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at javax.management.StandardMBean.getAttribute(StandardMBean.java:358)
at com.sun.hss.domain.AbstractResourceMXBean.getAttribute(AbstractResourceMXBean.java:209)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
at com.sun.jdmk.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:178)
at com.sun.hss.domain.ForwardingMBeanServerInterceptor.getAttribute(ForwardingMBeanServerInterceptor.java:178)
at com.sun.hss.domain.ConcurrentMBeanServerInterceptor.getAttribute(ConcurrentMBeanServerInterceptor.java:837)
at com.sun.hss.domain.ForwardingMBeanServerInterceptor.getAttribute(ForwardingMBeanServerInterceptor.java:178)
at com.sun.hss.domain.ForwardingMBeanServerInterceptor.getAttribute(ForwardingMBeanServerInterceptor.java:178)
at com.sun.cacao.agent.DispatchInterceptor.getAttribute(DispatchInterceptor.java:506)
at com.sun.cacao.agent.auth.impl.AccessControlInterceptor.getAttribute(AccessControlInterceptor.java:310)
at com.sun.jdmk.JdmkMBeanServerImpl.getAttribute(JdmkMBeanServerImpl.java:640)
at com.sun.cacao.cached.impl.server.ServerCachedMBeanProxy.rereadAttribute(ServerCachedMBeanProxy.java:326)
at com.sun.cacao.cached.impl.server.ServerCachedMBeanProxy.refreshCachedAttributes(ServerCachedMBeanProxy.java:295)
at com.sun.cacao.cached.impl.server.RefreshAttributeTask.execute(RefreshAttributeTask.java:60)
at com.sun.cacao.cached.impl.server.TaskPool$TaskExecutor.run(TaskPool.java:334)
pstack shows the following for the thread (note that the 374f parameter corresponds to the child in question):
----------------- lwp# 103 / thread# 103 --------------------
fef30ee7 waitid (0, 374f, b48a6fc0, 3)
fef25b52 waitpid (374f, 0, 0) + 75
fe488242 Java_java_lang_UNIXProcess_forkAndExec (840d4ec, b48a7170, b48a716c, b48a7168, 7, 0) + 6ca
fb40a598 ???????? (0, fb408219, bc1e7260, bc1e7250, bc1e7240, 0)
fb402de9 ???????? (0, 0, 0, 0, 7, bc1e71f0)
fb7f62b0 ???????? ()
This appears to be stuck in the following code in Java_java_lang_UNIXProcess_forkAndExec():
close(fail[1]); fail[1] = -1; /* See: WhyCantJohnnyExec */
if (read(fail[0], &errnum, sizeof(errnum)) != 0) {
waitpid(resultPid, NULL, 0);
throwIOException(env, errnum, "Exec failed");
goto Catch;
Although this is highly reproducible in our lab stress tests, it was hard to produce when running truss. However, in this case I was able to capture it with truss -l -t read,waitid. Here is the truss output from the time of the failure (note the lwp id is 103, consistent with the above):
/103: siginfo: SIGJVM1 pid=13520 uid=0 code=-1
/103: 0.0024 read(309, 0xB48A70B4, 4) Err#4 EINTR
/103: waitid(P_PID, 14159, 0xB48A6FC0, WEXITED|WTRAPPED) (sleeping...)
It looks like there is a bug in the code above. It assumes that a non-zero return from the read system call represents a successful read of errnum, while in this case, the return value is actually -1. This causes the calling thread to wait for the child process to complete while, in our example, the child process has successfully exec'ed and is actually waiting to communicate with the hung thread upon its return.
Posted Date : 2008-03-04 18:39:07.0
|
|
Work Around
|
N/A
|
|
Evaluation
|
The read in question from a private pipe assumes either success or EOF,
but an EINTR is not impossible, and so should be handled by retrying the operation.
Posted Date : 2008-03-04 22:16:41.0
Here is my (untested) attempt to get the logic completely right.
Does programming always have to be so hard?
diff --git a/src/solaris/native/java/lang/UNIXProcess_md.c b/src/solaris/native/java/lang/UNIXProcess_md.c
--- a/src/solaris/native/java/lang/UNIXProcess_md.c
+++ b/src/solaris/native/java/lang/UNIXProcess_md.c
@@ -479,6 +479,37 @@ closeSafely(int fd)
close(fd);
}
+/*
+ * Reads nbyte bytes from file descriptor fd into buf,
+ * The read operation is retried in case of EINTR or partial reads.
+ *
+ * Returns number of bytes read (normally nbyte, but may be less in
+ * case of EOF). In case of read errors, returns -1 and sets errno.
+ */
+static ssize_t
+readFully(int fd, void *buf, size_t nbyte)
+{
+ ssize_t remaining = nbyte;
+ for (;;) {
+ ssize_t n = read(fd, buf, remaining);
+ if (n == 0) {
+ return nbyte - remaining;
+ } else if (n > 0) {
+ remaining -= n;
+ if (remaining <= 0)
+ return nbyte;
+ /* We were interrupted in the middle of reading the bytes.
+ * Unlikely, but possible. */
+ buf = (void *) (((char *)buf) + n);
+ } else if (errno == EINTR) {
+ /* Strange signals like SIGJVM1 are possible at any time.
+ * See http://www.dreamsongs.com/WorseIsBetter.html */
+ } else {
+ return -1;
+ }
+ }
+}
+
#ifndef __solaris__
#undef fork1
#define fork1() fork()
@@ -606,9 +637,15 @@ Java_java_lang_UNIXProcess_forkAndExec(J
/* parent process */
close(fail[1]); fail[1] = -1; /* See: WhyCantJohnnyExec */
- if (read(fail[0], &errnum, sizeof(errnum)) != 0) {
+
+ switch (readFully(fail[0], &errnum, sizeof(errnum))) {
+ case 0: break; /* Exec succeeded */
+ case sizeof(errnum):
waitpid(resultPid, NULL, 0);
throwIOException(env, errnum, "Exec failed");
+ goto Catch;
+ default:
+ throwIOException(env, errno, "Read failed");
goto Catch;
}
Posted Date : 2008-03-05 07:52:00.0
I am embarrassed to admit that I fell into the canonical "Worse" trap described by
the famous paper:
http://www.dreamsongs.com/WorseIsBetter.html
I am a "Right Thing" kind of guy myself,
even more so after thinking about this bug.
Posted Date : 2008-03-05 23:22:56.0
Submitter has verified the fix.
Posted Date : 2008-03-06 02:53:17.0
It appears that this bug is a regression introduced by the fix for
4052517: (process) Runtime.exec won't execute programs belonging to other groups on Unix
4811767: (process) Runtime.exec should throw IOException when workdir does not exist (Unix)
5033302: (process) Can't execute Solaris NFS programs with uid>64k on Linux-amd64
in mustang-b81
Posted Date : 2008-03-14 18:47:15.0
|
|
Comments
|
Submitted On 25-MAR-2008
BharatD
I am seeing this issue for this version of java
---------
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)
Java HotSpot(TM) Server VM (build 1.5.0_10-b03, mixed mode)
-------
Has it been fixed for java 1.5 as well?
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |