|
Description
|
I am including a bug description from xxxxx .
There is no test case and they do have a workaround. They have
also agrred to try this in 1.1.
My motivation for filing a bug is mostly to make sure we have
a record of it and if somebody else stumbles upon it they will
be able to get the workaround from the included report.
If a test case arrives or the bug seems to have disappeared in 1.1
I will update the report.
Also, if anybody has any additional comments on the included report
they are welcome.
------------------------------------------------------------------
Tom, I believe I'm seeing a Java VM bug involving threads
on Windows 95. I'm hoping you will be able to get someone
at Javasoft interested in hunting down the problem.
Unfortunately, I've been unsuccessful in producing a test case,
so the problem would probably have to be investigated on site.
Here's everything I know about the problem:
TEST ENVIRONMENT:
There are two VMs running on one PC, (Pentium 166, 32 meg,
Windows95). One is a client, the other is a server. They
communicate through routing software on a separate machine.
I'm testing an ORB, components of which run on the client
and the server. The ORB is supported by Novera's messaging
system, running on the client, server and router.
TEST PROGRAM:
The client program runs several threads. Each thread interacts
with the server through the ORB. Novera's messaging is implemented
so that all client/server interactions go through one socket, even if
there are multiple threads.
In each thread, there is a loop which runs 10 times. In each iteration
of the loop, a 50,000 byte array is requested from the server. The
client validates the array for size and content. (Novera messaging
adds two other threads, one receiving events from the router, one
feeding messages to the application threads.)
The server simply creates arrays when requested and returns them.
SYMPTOMS:
The following symptoms have been observed on Windows95. I have
done a number of runs on Solaris and never saw the same problem.
I run the client with 20 threads. 20-30% of the time, the program
runs successfully. The rest of the time there is an
IllegalMonitorStateException. The exception terminates one thread
and the others continue successfully. Once I saw two exceptions in
a single run.
The stack trace from the exception is as follows:
java.lang.IllegalMonitorStateException: current thread not owner
at epic.runtime.MessageInputStream.getNextMessage(MessageInputStream.java:197)
at epic.runtime.MessageInputStream.read(MessageInputStream.java:309)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:107)
at java.io.BufferedInputStream.read(BufferedInputStream.java:141)
at java.io.DataInputStream.readFully(DataInputStream.java:90)
at epic.orb.BasicIOCI.recvByteArray(BasicIOCI.java:405)
at Server_Proxy.get_array(Server_Proxy.java:38)
at Driver.run_sync(Driver.java:35)
at Driver.run(Driver.java:21)
at java.lang.Thread.run(Thread.java:294)
MessageInputStream encapsulates a Vector of message objects. getNextMessage
is synchronized and waits in a loop for messages to arrive. Line 197 is as follows:
if (current_message.get_status () == current_message.MSG_REMOTE_EXCEPTION)
The method invoked is message.get_status, which is non-synchronized. This method
simply returns a field -- no wait, notify, or notifyAll.
On one occasion, the stack trace pointed to a line slightly later in the
getNextMessage method.
On another occasion, the stack trace pointed to
java.io.DataInputStream.readFully:90.
In other words, it appears to be the case that a problem is "noticed" shortly
after leaving the wait loop. Folks who know more about the VM than I do assure
me that this sort of delay is impossible, so maybe the stack trace is suspect.
However, I've seen the above stack trace several dozen times, (and I've noted
the only two exceptions observed).
The Vector of messages is populated by MessageInputStream.addMessage, a synchronized
method. It adds a message to the Vector, calls notify, and exits. Because each
MessageInputStream has its own Vector, synchronized operations on Vectors are
not necessary. I prepared a version of Vector without synchronized methods and
saw exactly the same behavior.
I observed these problems with 10, 20, 40 and 60 threads. At 60 threads, hangs
or other crashes (VM, null pointer exception, out of memory) were frequent.
I was unable to reproduce the problem with 5 threads.
ATTEMPT TO CREATE A TEST CASE:
I tried to produce a test case but was unable to get IllegalMonitorStateException
to occur. I wrote Producer and Consumer classes. Each Consumer has a Vector of
Objects, the state of which is maintained by Consumer.get_next_message and add_message,
as for MessageInputStream. I tried adding sleep()s of various durations at various
points, to try and simulate the delays due to networking in Novera's product.
WHY I BELIEVE THERE'S A VM PROBLEM:
The synchronization used by MessageInputStream is very simple - it's right out
of the textbook. From what I've read of IllegalMonitorStateException, one of
two things is happening:
1) A thread is dying and leaving monitors corrupt, somehow: Can't be. I've
added traces to report on the customer of each thread. The only threads dying
are those throwing IllegalMonitorStateException. I don't think the thread is
dying (causing the exception) -- I'm able to catch the exception and then
do stuff, e.g. call System.out.println.
2) wait or notify not done in the dynamic scope of a syncrhonized method or block:
Can't be happening. MessageInputStream.getNextMessage and addMessage are synchronized,
and they contain the only wait and notify invocations in MessageInputStream.
The stack trace is suspect, as noted above. So at the very least, there is probably
something wrong with the way stacks are reported following an IllegalMonitorState
Exception.
|