I have not been able to reproduce it yet. Volano mark is a total resource hog
and when I run it, it fails due to various things like running out of port
Looking at the stack trace, it appears to be failing in ntdll.dll rather
than winsock, so it might be as a result of a problem with the malloc()
in the socketRead0 routine.
This code has not changed in a long time, so I doubt there is a bug
in it. I will continue trying to reproduce it.
Ok. A Windb stack trace shows it is definitely crashing
in malloc. Also, interestingly, it is not the server which
crashes, rather the client which is run repeatedly as a
standalone application, and which runs successfully several
hundred times before the crash happens. I am pretty confident
that this is not a JDK bug, but is an OS bug. malloc should
never crash regardless of what parameters it is given.
Stack trace shown below:
I have run several experiments attempting to isolate the cause of this
I have reproduced this failure on three different builds of Windows Server2003
1068, 1073 and 1184.
This bug reproduces in interpreted mode -Xint.
This bug fails on B48 in addition to B49 where it was originally reported.
I've attempted to use the Microsoft Debugging Utilities to performance a
heap verification on each malloc / free call but since this slows down
the test, it does not fail. I've also run the test on the Microsoft
Checked build and it did not fail.
When it does crash, I get the same stack dump in the failing thread
as Michael reports. One interesting note is that the Volano client
appears to be in the process of terminating its connection threads
because there are only 52 threads left at the time of the crash and
normally there are over 400. I've scanned the VM sources involved in
thread termination and I don't see any race conditions involved in
freeing memory when a GC could occur, etc.
This may be a coincidence, but I've done a successful overnight run
on two different systems using the switch -XX:+UseDefaultStackSize.
This causes the memory for the thread stacks to be reserved and not
committed. I don't understand why this would fix this problem but
might be a clue for determining what is going on.
I don't agree with the statement that the OS should never get a segv
from within malloc. If a program corrupts the C Heap by writing
garbage to heap header data structures, this would cause the crash
that we are seeing.
I have been able to reproduce the exact crash reported here
with a small C++ test case. This is a bug in Windows.
Compile the small test case with "cl /MD mallocbasher.cpp" and
run the test case on a multi-cpu Windows Server 2003 AMD64 box
and it will crash in a few minutes.
I submitted a bug on microsoft's beta web site at beta.microsoft.com.
The bug number is 154243817.
It turns out that Microsoft expects these Access Violations. They
are using structured exception handling (try/except blocks) in
the malloc library and need to get control on AV's caused by their
code. Since we are using Vectored Exception Handing rather than SEH,
we see this AV first and report it as a fatal error.
The full and correct fix for this problem is to stop using Vectored
Exceptions and support SEH in compiled code. This will require
significant effort/testing/risk. We must register all dynamically
generated code in the VM using RtlInstallFunctionCallback or
Since it is late in the 1.5 release, I will put a short term fix
into 1.5 which will pass AV's on from our exception handling code
if the AV is generated from NTDLL.DLL.
I will open up a new bug to keep track of this issue.