|
Evaluation
|
This is not a bug in the Java Plug-In. It is probably a bug in the Iris applet as I don't think we cooperatively terminate our ThreadPoolExecutors. The reason for the IllegalMonitorStateException is probably that AppContext.dispose() is throwing ThreadDeath at the ThreadPoolExecutors' worker threads.
On the other hand, I think at least one ThreadPoolExecutor might be used internally in some of the libraries we use, and those APIs don't have an option to shut down those internal ThreadPoolExecutors cooperatively, so an argument might be made that ThreadPoolExecutor should deal more gracefully with this termination scenario.
I'm reassigning this bug to the Core Libraries team to see whether there is anything more we should be doing in this area. The submitter should file an Issue against the Iris project at http://iris.dev.java.net/ .
Posted Date : 2007-11-08 02:31:38.0
We are surprised by the submitted stacktrace.
It should "never happen".
Someone familiar with the application's source code
needs to create a reduced test case using only core jdk apis,
so that the underlying cause can be determined.
It would aldo help to know which jdk versions can be used to
reproduce this bug.
Posted Date : 2007-11-08 07:08:16.0
The application uses only core JDK APIs. It does not use any Sun-private APIs.
You should be able to add logging to ThreadPoolExecutor and replace or update rt.jar in an installed JRE to track down the problem. For this reason I'm unmarking this bug as incomplete. Please investigate it further with the test case available in the form of the app.
The Iris app requires JDK 6, so that's the earliest version where we've seen this IllegalMonitorStateException. We did also see it on JDK 7 builds a couple of months ago, but have since stopped active testing on JDK 7.
Posted Date : 2007-11-08 18:46:20.0
I can reproduce this with Suse-Linux-mozilla-1.7.8-jdk-6u4.
(Thanks to Ken for his patience tutoring this newbie.)
It's curious that this bug and bug 6572867 with exactly the same symptom
are applets, with both generating the same "impossible" stacktrace.
There are enough uses of java.util.concurrent that serious bugs like this
should have been reported elsewhere... still a deep mystery.
The presence of a security manager while running an applet shouldn't matter,
should it?
Posted Date : 2007-11-10 00:02:00.0
I followed the path of investigation proposed by Ken,
and it indeed proved fruitful.
Here's what's going on:
AWT tries to shutdown an applet thread group,
eventually calling Thread.stop and causing ThreadDeath to be thrown
in each blocked applet thread, with this stack trace:
java.lang.ThreadDeath
at java.lang.Thread.stop(Thread.java:770)
at java.lang.ThreadGroup.stopOrSuspend(ThreadGroup.java:687)
at java.lang.ThreadGroup.stop(ThreadGroup.java:601)
at sun.awt.AppContext.dispose(AppContext.java:456)
at sun.applet.AppletClassLoader.release(AppletClassLoader.java:756)
at sun.plugin.security.PluginClassLoader.release(PluginClassLoader.java:439)
at sun.applet.AppletPanel.release(AppletPanel.java:204)
at sun.applet.AppletPanel.sendEvent(AppletPanel.java:302)
at sun.plugin.AppletViewer.onPrivateClose(AppletViewer.java:1071)
at sun.plugin.AppletViewer$3.run(AppletViewer.java:1020)
at java.lang.Thread.run(Thread.java:674)
The thread receiving ThreadDeath is blocked in LinkedBlockingQueue.take().
More specifically, looking at the body of LinkedBlockingQueue.take():
it acquired LinkedBlockingQueue.takeLock,
then waits on a condition for takeLock, which causes the lock to be released,
then a ThreadDeath is magically thrown, which naturally leads to
the lock not being reacquired, but the finally clause tries to unlock the
already unlocked lock, leading to IllegalMonitorStateException.
import java.util.concurrent.*;
As in this reduced test:
public class Bug {
public static void main(String[] args) throws Throwable {
final BlockingQueue q = new LinkedBlockingQueue();
Thread thread = new Thread() {public void run() {
try { q.take(); }
catch (Throwable t) {
Thread.interrupted();
t.printStackTrace();
}}};
thread.start();
Thread.sleep(1000);
thread.stop();
thread.join();
}
}
Should code in java.util.concurrent try to protect against ThreadDeath?
Probably not. ThreadDeath can occur ANYWHERE. Explained further here:
http://java.sun.com/javase/6/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html
So this seems like a problem in the applet code in classes_awt.
I don't have a good solution. It seems like a very hard problem.
But setting an UncaughtExceptionHandler for the
threads in question is worth investigating, although this might
help to sweep problems deeper under the rug.
Redispatching to classes_awt.
Posted Date : 2007-11-10 07:59:34.0
The AppContext shutdown sequence first calls Thread.interrupt() on all threads running in that AppContext. Only if threads fail to respond to the interrupt in a timely fashion does the AppContext disposal code use Thread.stop(), which is the only "heavy hammer" available in the current Java platform for stopping threads (and even it is not guaranteed reliable).
One outstanding question is why the ThreadPoolExecutor does not respond to calls to interrupt(). This might be an application issue (where the Iris app does not properly shut down certain of its background ThreadPoolExecutors), but at least one occurrence of this exception, related to the termination of the per-AppContext Swing Timer thread, has no workaround at the application level.
As a related aside, the specification of Lock.lock() is problematic. Lock.lockInterruptibly() is probably the only variant that should ever have been specified.
More investigation is needed at the java.util.concurrent and ThreadPoolExecutor level to see what is needed in the face of AppContext termination. This should not simply be recategorized as an AWT bug.
Posted Date : 2007-11-12 20:09:48.0
A thread pool must be shutdown to allow its threads to properly terminate.
Probably the application should be calling shutdownNow when the application
main thread is interrupted.
Any class that creates an internal thread pool must provide its own
shutdown method that calls shutdown or shutdownNow on the internal pool,
and documents that this higher level shutdown method is required to be called
at application exit time.
Tasks running in a thread pool must respond to interrupt in order for them
to be cancellable.
LinkedBlockingQueue.take() will throw an InterruptedException if interrupted.
If a thread running in a thread pool receives an asynchronous interrupt from
an unrelated thread, it is not obvious if the interrupt is meant as a signal
for the pool itself to shutdown or for a task running in that pool to terminate.
It is possible to rewrite the condition await methods
so that in the case of an asynchronous exception thrown while parked,
the lock is reacquired. But...
All of the lock and thread pool code in java.util.concurrent
is extremely tricky, and we are wary of changing it in any way now.
Posted Date : 2007-11-12 22:25:01.0
Given the above, it sounds like one or more of the following should be done if they aren't already:
- AWT / Swing should cooperatively shut down the ThreadPoolExecutor associated with Swing timers during AppContext destruction. (This might only apply to JDK 7.)
- The Iris application should be changed to cooperatively shut down its ThreadPoolExecutors during Applet.stop() / Applet.destroy().
- The ThreadPoolExecutor code should be changed to detect asynchronous exceptions (in particular ThreadDeath) and consider that a signal to shut down the ThreadPoolExecutor in lieu of the application stopping the ThreadPoolExecutor explicitly. It sounds like the reason for the current behavior is that the ThreadPoolExecutor swallows the Thread.interrupt() calls made by the AppContext.dispose() code and spawns new worker threads, which cause AppContext.dispose() to think it isn't making progress and to start throwing asynchronous exceptions via Thread.stop().
Posted Date : 2007-11-12 22:45:06.0
I'm unclear how appContext.dispose knows the set of threads to try and stop but I'll assume they are in the same ThreadGroup. If that is the case and the TPE threads are also in that group then the interruption of the TPE worker threads will just cause them to be replaced by new worker threads. This is as designed and desired. The TPE must be explicitly shutdown and if the most prompt shutdown is required then shutdownNow() must be used.
I do not believe it is feasible to write ThreadPoolExecutor to be aware of asynchronous exceptions and handle them. The async exception could hit at absolutely any point in the code. It might be possible to do slightly better than today - if we assume there are particular code sections that are more worthy of attempted protection than others - but this seems an excessive effort for a minimal gain. Nor should TPE make assumptions about what encountering a ThreadDeath exception means. Note that practicaly no code is written to be async exception safe so it is very easy for a ThreadDeath exception to be converted to a different exception due to execution of finally blocks where invariants are found to be violated - ref the case in point.
Thread.stop is a "hammer" and when you use a hammer you should expect breakage - caveat emptor.
As for the specification of Lock.lock() it mirrors the normal/expected non-interruptibility of monitor acquisition. Only having interruptible locking would be problematic for application code that needs to guarantee certain actions occur (not in the face of async exceptions, but in the face of sync interrupt requests) and would cause API pollution due to the proliferation of "throws InterruptedException"; or else would result in interrupts continually being swallowed by code that doesn't know how to handle them.
Posted Date : 2007-11-12 23:20:23.0
The common point in all of the stack traces has been in LinkedBlockingQueue.take() -- so I think better robustness in the implementation of this one API is probably warranted and is hardly excessive effort.
Posted Date : 2007-11-12 23:44:51.0
You could hide the IllegalMonitorStateException but the ReentrantLock and its Condition object could be in an indeterminate/invalid state and subsequent use by other threads could lead to a range of possible exceptions.
Is hiding the IMSE what is desired?
Posted Date : 2007-11-13 00:12:37.0
No. I would suggest that code be added to translate the ThreadDeath into an InterruptedException (which should be a very small code addition and which doesn't have to handle multiple ThreadDeaths) and to simultaneously signal termination of the TPE which owns the thread.
Posted Date : 2007-11-13 00:36:10.0
Note that the problem manifests beyond LinkedBlockingQueue.take().
Sister bug 6572867 has essentially the same problem,
but with LinkedBlockingQueue.take() replaced by DelayQueue.take().
If we are to do anything here, it would be at a lower layer,
either in ReentrantLock or in AbstractQueuedSynchronizer.
Posted Date : 2007-11-13 00:40:10.0
Please give an example of where in the code you would expect the ThreadDeath to be converted to InterruptedException as I am unclear what happens to the lock.unlock() inside take() in your envisaged scenario.
The key issue here is whether or not the lock was reacquired as part of await(). If we don't know the answer to that then we can't make an informed decision as to whether or not to unlock the lock. Hence my suggestion that we could hide the IMSE, but that we'd still be leaving the lock/condition in an indeterminate and possibly corrupt state which might lead to further exceptions in other threads.
Also any code that presumes that ThreadDeath implies "shutdown the TPE" would have to be part of the TPE. The blocking queue code is not part of the TPE and the queue has no knowledge that it is being used in a TPE or by a TPE worker thread. So again I'm unclear exactly what code you want to catch the ThreadDeath.
Posted Date : 2007-11-13 00:53:26.0
I don't have a complete solution in mind, but here is a suggested modification to LinkedBlockingQueue to at least detect the ThreadDeath and not cause the IMSE to be thrown. Higher-level code could then catch ThreadDeath and cope with it.
ReentrantLock lock = ...;
boolean unexpectedExceptionOccurred = false;
lock.lockInterruptibly();
try {
try {
unexpectedExceptionOccurred = true;
lock.await();
unexpectedExceptionOccurred = false;
} catch (InterruptedException e) {
// handle InterruptedException
unexpectedExceptionOccurred = false;
}
} finally {
if (unexpectedExceptionOccurred) {
// Signal via thread-local storage or similar
// to higher-level pieces of code that
// catastrophic failure occurred?
// Explicitly throw InterruptedException?
// (Would probably require catching ThreadDeath --
// not desirable)
// Consider public API changes for this case
// in future releases?
} else {
lock.unlock();
}
}
Posted Date : 2007-11-13 01:30:43.0
} finally {
if (unexpectedExceptionOccurred) {
...
} else {
lock.unlock();
}
}
But this finally clause can not tell whether the unexpected exception prevented reacquisition of the lock or not and might not unlock a lock the current thread owns. At best it should do this:
} finally {
try {
lock.unlock();
}
catch(IllegalMonitorStateException imse) {
if (!unexpectedException)
throw imse;
else {
// ???
}
}
}
That leaves the ??? as to how to report to a higher-level something really bad has happened. But I don't see it as feasible for this code, in a collection class, to
(a) try to accommdate async exceptions (especially as the lock/condition can be left in a corrupt state - as could the queue itself)
(b) assume responsibility for trying to inform callers - other than by letting the original exception propagate
Posted Date : 2007-11-13 01:45:07.0
The suggested finally clause should work in 99% of all situations and would have better behavior than currently.
Another possible solution would involve catching all remaining Throwables aside from InterruptedException, storing off the Throwable, and using sun.misc.Unsafe.throwException() in the finally block to rethrow the Throwable. However since this would introduce Sun-specific code I would personally opt for the previous suggestion despite its limitations.
Posted Date : 2007-11-13 01:55:55.0
In summary I see three different actions needed, as outlined by Ken:
1. AWT/Swing has to ensure it cleanly shuts down any thread-pools or timers used within the framework when disposing of a particular AppContext.
2. The Applets concerned need to ensure they shutdown any thread-pool they use internally as part of Applet.stop() or Applet.destroy()
3. Investigate whether the code in LinkedBlockingQueue needs to be, and can be, more robust in the face of asynchronous exceptions. I think at most we would use the try-unlock-finally to mask the IllegalMonitorStateException. But rethrowing the ThreadDeath will only change the stacktrace information not the actual behaviour. I think it unreasonable to expect this group of utility classes to handle async exceptions.
Aside: if all the threads are part of a ThreadGroup defined by the AppContext, then the uncaughtException method could note that the context was being disposed and so ignore all exceptions during this teardown phase.
Posted Date : 2007-12-05 00:19:41.0
I agree with David's analysis, except for where one could
apply a bandaid in the core library code.
I don't think anything can be done in LinkedBlockingQueue,
but I do think one could experiment in AbstractQueuedSynchronizer.
Of course, asynchronous exceptions can occur anywhere, but
it is most likely that they occur during a call to LockSupport.park().
We could modify the code by imagining that instead of
park() always simply returning, it might occasionally throw
a spurious ThreadDeath. A method like condition await
is currently written in the "optimistic" style
int savedState = fullyRelease(node);
while (!isOnSyncQueue(node)) {
LockSupport.park(this);
}
acquireQueued(node, savedState)
whereas we could write in a more paranoid style
int savedState = fullyRelease(node);
try {
while (!isOnSyncQueue(node)) {
LockSupport.park(this);
}
} finally {
acquireQueued(node, savedState);
}
but this is difficult code with many subtle details to get right.
Possible loss of performance is also definitely a consideration.
Anyways, this would almost guarantee that a lock is reacquired
after returning from await().
I don't know exactly what the impact of such a change would be
on thread pool shutdown. Perhaps thread pool code would also
have to be taught about possible ThreadDeath.
Posted Date : 2007-12-06 07:33:56.0
I think that trying to be more robust about re-acquiring the lock is going in the wrong direction. I think instead the code should "fail fast" and propagate the ThreadDeath out. I agree with the statement to make the ThreadPool code detect ThreadDeath and force a termination of the ThreadPool owning the current thread.
Posted Date : 2007-12-06 18:59:39.0
The code Martin sketched out may be an oversimplication. And it would only help if the ThreadDeath was thrown while actually parked; if that happened just after then we'd have the same problem. But philosophically, having Condition.await() guarantee to not return unless it holds the lock, seems the right solution. In practice it is difficult to implement.
Further, given the current situation issues a Thread.interrupt followed by a Thread.stop, there's no guarantee that the thread will be in the Condition code - that all depends on the number of threads involved, the load on the system and the number of CPU's (given there is a pause between the interrupt and the stop).
I am totally opposed to intercepting ThreadDeath and presuming that it means "shutdown the executor". stop() could trigger numerous different failure modes that convert the ThreadDeath to a different exception. And applications might still use Thread.stop for their own purposes (deprecated or not).
The idea solution requires more sophisticated lifecycle management API's for objects whose lifecycle is constrained by an "enclosing" object i.e. an AppContext. When the AppContext is destroyed then the objects it contains should be "destroyed" too. We can't rely on Applets to take care of this themselves, in the general case.
Posted Date : 2007-12-06 23:05:52.0
I've accepted this bug as a "classes_util_concurrent" bug and
changed synopsis to match the possible plan of action this
discussion has converged upon. I'm still doubtful about the
wisdom of modifying the very sensitive code in AbstractQueuedSynchronizer,
so anyone else who feels strongly about changing this is
invited to take ownership.
Posted Date : 2008-01-26 05:15:48.0
|