EVALUATION
The originator has clarified that the initial version was S10u6 so while 6600939 seemed a likely cause, that would appear not to be the case.
|
|
|
PUBLIC COMMENTS
With regard to the "incorrect synchronization", if there is only a single writer then no inconsistent value can be seen by the reader.
|
|
|
EVALUATION
The originator reports that the problem disappeared after updating to Solaris 10 update 7. The previously used Solaris version was S10u5. It is possible that this was caused by 6600939, which was fixed in S10u6.
|
|
|
PUBLIC COMMENTS
The test program seems to be making some assumptions. The basic premise seems to be that within a REPORT_PERIOD (20seconds) every EventRouter thread (there are 8) will be able to "tick" which requires either that an event is enqueued or the timeout elapses. The timeout is 100ms. So under good conditions you expect all event routers to have ticked within 800ms, and you'd expect at worst around 24-25 ticks per reporting period. But that seems to overlook the effects of GC activity and even compilation activity; and the fact that scheduling need not be at all fair. Information on GC pauses would be useful to rule out GC interference.
Note also that the test is incorrectly synchronized. A number of long values are read and updated without using a lock to protect them. This can lead to inconsistent values being read once the value is beyond the capacity of a 32-bit unsigned value.
|
|
|
PUBLIC COMMENTS
Can we get a pstack, or "jstack -m", dump of the JVM process when the hang occurs please.
Also the version info shows the 32-bit VM, but the start script posted on the forum shows the 64-bit VM being invoked. PLease clarify if the problem exists only on 64-bit.
|
|
|
PUBLIC COMMENTS
I've been unable to reproduce this locally so far. I only have easy access to two machines, one of which runs fine and the other doesn't have enough memory to run the test in the given configuration.
If this failure is so platform specific it may be an OS and/or hardware issue rather than a j.u.c or JVM issue.
|
|
|
PUBLIC COMMENTS
Can the submitter modify the program to use Object.wait() rather than Thread.sleep ? I'd like to make sure the problem comes from the awaits() not returning rather thena the sleeps returning early and mis-reporting failures.
Thanks,
David Holmes
|
|
|