EVALUATION
The networking team will not support write timeout for its complex, so the fix for jdk 7 is the same as for jdk 6.
|
|
|
EVALUATION
--Christopher.Hegarty--
Solution:
The general consensus was that we need to change the lock around writing records to something like a java.util.concurrent.locks.ReentrantLock that way we could have something like:
Thread A calls close. Tries to acquire write lock. If cannot acquire lock within SoLinger, then closeSocket.
--Xuelei.Fan--
I run into the corner on the issue.
The following scenarios seems fine:
if ( get the write lock in SO_LINGER time) {
// send the SSL/TLS required data, and then [*]
// close the socket.
} else {
// close the socket immediately
}
But the above scenarios only works in the situation that the out stream has been blocked or there are enough buffer left to hold the SSL/TLS required data. There are risks that SSL/TLS data will full fill the out stream and block indefinitely. So it does not solve the issue.
One possible workaround may be that the customer set the SO_LINGER to zero, the socket will close immediately without any more actions. but it means that the application is in the dangerous of losing data in normal situation, I don't think it is the acceptable.
--Christopher.Hegarty--
After discussing the issue that requires the send buffer to be increased:
setSendBufferSize(getSendBufferSize() +
Record.maxAlertRecordSize);
As I mentioned when suggesting that you try this workaround, it is not guaranteed to always work. For example, the buffer may be as large as possible. We ( the Networking team ) feel that unless the customer is specifically encountering this issue, that adding this workaround is not a good idea. It will only make a corner case more obscure and harder to diagnose.
My assumption is that this is an escalated bug and will be required to be backported. The fix without increasing send buffer should be sufficient for 99.9% of cases.
For Java SE 7 we are looking at possibly adding a write timeout to Socket and if this happens then you could use this to avoid the above problem.
--Xuelei.Fan--
We will use two different fix for tiger/mustang and jdk7. For tiger/mustan, the reentrant lock used to waiting for so_linger timeout; while for jdk7, we will try to adress the issue with write timeout, which is a much more stable and reliable solution.
|
|
|
EVALUATION
The Socket.shutdownOutput() spec says that "For a TCP socket, any previously written data will be sent followed by TCP's normal connection termination sequence." That means that implementation of shutdownOutput() helps nothing on solving the issue, becaue the previous written data will continue be blocked and the close() still have to wait for the unblocking the socket.
|
|
|
EVALUATION
When the server abruptly removed from the network, the client maybe failed to get any alert, so the socket will keep alive (I think the solaris system maybe get the report in a short time, while linux DO NOT get any information about the break even for for 30 minutes as the description). Before the client get the break info, if will continue send messgaes to the server if there are any data, after the socket output buffer full, the send()/write() blocks the messages. For SSL, when calling close(), it is required to send a close_notify alert before closing the write side of the connection. Because the socket has been blocked, the close_notify will have to wait in line, then the close() will not return before unblocked.
The ideal solution would be that the client should get alert shortly after the connection break, as solaris do(if the bug only happens on linux).
Alternativelly, will think about the workaround of SoLinger or shutdownOutput.
|
|
|
EVALUATION
Managed to get the same stack trace from a simple test, attached. The thing is that SSL spec states that "...Each party is required to send a close_notify alert before closing the write side of the connection.". So when closing a SSLSocket asynchronously, the already bloked writing thread holds a write lock, which the closing thread also wants to acquire when trying to send close_notify message. Thus the deadlock.
|
|
|
|