SUGGESTED FIX
--- PlainSocketImpl.c- 2008-05-08 22:54:05.296670972 +0400
+++ PlainSocketImpl.c 2008-05-08 22:54:05.192796472 +0400
@@ -345,15 +345,29 @@
* See 6343810.
*/
while (1) {
- fd_set wr, ex;
+#ifndef USE_SELECT
+ {
+ struct pollfd pfd;
+ pfd.fd = fd;
+ pfd.events = POLLOUT;
+
+ errno = 0;
+ connect_rv = NET_Poll(&pfd, 1, -1);
+ }
+#else
+ {
+ fd_set wr, ex;
- FD_ZERO(&wr);
- FD_SET(fd, &wr);
- FD_ZERO(&ex);
- FD_SET(fd, &ex);
+ FD_ZERO(&wr);
+ FD_SET(fd, &wr);
+ FD_ZERO(&ex);
+ FD_SET(fd, &ex);
+
+ errno = 0;
+ connect_rv = NET_Select(fd+1, 0, &wr, &ex, 0);
+ }
+#endif
- errno = 0;
- connect_rv = NET_Select(fd+1, 0, &wr, &ex, 0);
if (connect_rv == JVM_IO_ERR) {
if (errno == EINTR) {
continue;
|
|
|
EVALUATION
Yes, this is a clear and well known problem/limitation with the select system call. select should be replaced with poll in this case to avoid the limitation of 1024 file descriptors. This would be the preferred solution rather than defining FD_SETSIZE.
It look like this issue is as of a direct result of the library changes for CR 6343810, and any fix for this CR should be backported to update releases where 6343810 has also been fixed.
|
|
|
EVALUATION
Quoting Steve Goldman on this.
-------- Original Message --------
Subject: Re: 6670408: testcase panics 1.5.0_12&_14 JVM when java.net.PlainSocketImpl trying to throw an exception
Date: Tue, 06 May 2008 15:19:04 -0400
From: steve goldman <###@###.###>
Ok I found the bug. Dave Dice surmised the problem on Friday. So the
problem is in this code
PlainSocketImpl.c
while (1) {
fd_set wr, ex;
FD_ZERO(&wr);
FD_SET(fd, &wr);
FD_ZERO(&ex);
FD_SET(fd, &ex);
the fd goes well past the end of the bitvectors wr/ex. The limit on the
size on 32bits is 1024 bits. If I truss the program I see it get socket
descriptors well past 1024. It finally trips my memory protection check
when it was around 3000. If I hadn't messed up my protection code I
would have found this on Friday.
I looked at the java/io/FileDescriptor and the fd is in fact to large
for the statically allocated bitmap.
|
|
|
EVALUATION
The fix above needs to go into Hotspot as a separate bug, but it isn't relevant to this problem. This problem is about something going wrong with Hotspot when the network code tries to throw an exception.
|
|
|
EVALUATION
Instead we may need to verify a first system call actually happened and actually got interrupted before a 2nd connect is attempted on Solaris.
|
|
|
SUGGESTED FIX
Here's an alternative suggested fix (see comments and eval) using the 1.5.0_12 source:
--- ../old/os_solaris.inline.hpp Tue Mar 11 17:40:38 2008
+++ os_solaris.inline.hpp Tue Mar 11 18:07:55 2008
@@ -89,10 +89,11 @@
_setup; \
_before; \
OSThread* _osthread = _thread->osthread(); \
if (_thread->has_last_Java_frame()) { \
/* this is java interruptible io stuff */ \
+ errno = 0; \
if ((os::is_interrupted(_thread, _clear)) \
|| ((_cmd) < 0 && errno == EINTR \
&& os::is_interrupted(_thread, _clear))) { \
_result = OS_INTRPT; \
} \
--- ../old/hpi_solaris.hpp Tue Mar 11 17:40:37 2008
+++ hpi_solaris.hpp Tue Mar 11 18:07:38 2008
@@ -75,11 +75,11 @@
prevtime = ((julong)t.tv_sec * 1000) + t.tv_usec / 1000;
for(;;) {
INTERRUPTIBLE_NORESTART(::poll(&pfd, 1, timeout), res, os::Solaris::clear_interrupted);
- if(res == OS_ERR && errno == EINTR) {
+ if(res < 0 && errno == EINTR) {
gettimeofday(&t, &aNull);
newtime = ((julong)t.tv_sec * 1000) + t.tv_usec /1000;
timeout -= newtime - prevtime;
if(timeout <= 0)
return OS_OK;
|
|
|
EVALUATION
Need to check the UseVMInterruptibleIO value before testing _result against OS_INTRPT.
|
|
|
SUGGESTED FIX
The following two changes were red herrings: the third change to the socket impl code is the real fix for this CR. See notes for the history.
--- src/os/solaris/vm/hpi_solaris.hpp- 2007-05-15 22:29:42.012602000 +0400
+++ src/os/solaris/vm/hpi_solaris.hpp 2008-03-05 16:52:06.950605000 +0300
@@ -104,7 +104,10 @@
os::Solaris::clear_interrupted);
// Depending on when thread interruption is reset, _result could be
// one of two values when errno == EINTR
- if (((_result == OS_INTRPT) || (_result == OS_ERR)) && (errno == EINTR)) {
+ if ((UseVMInterruptibleIO == true &&
+ _result == OS_ERR && errno == EINTR) ||
+ (UseVMInterruptibleIO == false &&
+ ((_result == OS_INTRPT || _result == OS_ERR) && errno == EINTR))) {
/* restarting a connect() changes its errno semantics */
INTERRUPTIBLE(::connect(fd, him, len), _result,
os::Solaris::clear_interrupted);
|
|
|
|