Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6516308
Votes 19
Synopsis Multithreaded application hangs accessing LDAP
Category jndi:ldap
Reported Against
Release Fixed 1.4.2_23-rev(b03)
State 10-Fix Delivered, bug
Priority: 4-Low
Related Bugs 6544469
Submit Date 23-JAN-2007
Description
FULL PRODUCT VERSION :
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)
Java HotSpot(TM) Client VM (build 1.5.0_10-b03, mixed mode, sharing)
I have reproduced the issue with other java versions on other platforms as well.

ADDITIONAL OS VERSION INFORMATION :
SunOS solaris-1 5.9 Generic_112233-10 sun4u sparc SUNW,UltraAX-i2
 customer  Windows [Version 5.2.3790]


A DESCRIPTION OF THE PROBLEM :
The application initially creates an InitialLdapContext to connect to a LDAP server. During program run, multiple threads are running in parallel.

Each thread uses issues LDAP read operations using the following procedure outlined by this pseudo-code

void run() {
  LdapContext ctx = initialContext.newInstance()
  NamingEnumeration<SearchResult> e = ctx.search(...)
  ctx.close()
  ... <enumerate SeachResuts, close all contexts returned by getObject> ...
  e.close()
}



STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
For a simple test application running from the command line we need at least 1000 concurrent threads to reproduce the issue. Within an application server this issue appears more frequently with less concurrent threads.

Find attached ldap.zip with a isolated test case. Edit ldap.properties and the array of NAMES in Main.java for your environment. I'm running this test application against an Active Directory domain controller. NAMES holds a short list of computer names from Active Directory.

I'm seeing output like the following

$ java Main
<94><94><96>

$ java Main
SSSSSSSS<1970><5179><5653><5653><5654><5654><5654><5654>

etc. 

The application is able to do a varying number of queries to ldap, then it slows down considerably and finally all threads are hanging. 

The stack traces from jdb show that a single thread is stuck on "java.net.SocketOutputStream.socketWrite0 (native method)" while the other threads are waiting at "com.sun.jndi.ldap.Connection.readReply (Connection.java:408)"


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The application should not hang.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
With jdb I can see that there is one single thread in the "running" state, all other threads are in the "waiting for monitor" state

The stack trace for the thread in the running state is as follows:

TestThread51[1] where
[1] java.net.SocketOutputStream.socketWrite0 (native method)
[2] java.net.SocketOutputStream.socketWrite (SocketOutputStream.java:92)
[3] java.net.SocketOutputStream.write (SocketOutputStream.java:136)
[4] java.io.BufferedOutputStream.flushBuffer (BufferedOutputStream.java:65)
[5] java.io.BufferedOutputStream.flush (BufferedOutputStream.java:123)
[6] com.sun.jndi.ldap.Connection.writeRequest (Connection.java:390)
[7] com.sun.jndi.ldap.Connection.writeRequest (Connection.java:364)
[8] com.sun.jndi.ldap.LdapClient.search (LdapClient.java:528)
[9] com.sun.jndi.ldap.LdapCtx.doSearch (LdapCtx.java:1á944)
[10] com.sun.jndi.ldap.LdapCtx.searchAux (LdapCtx.java:1á806)
[11] com.sun.jndi.ldap.LdapCtx.c_search (LdapCtx.java:1á731)
[12] com.sun.jndi.ldap.LdapCtx.c_search (LdapCtx.java:1á748)
[13] com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search (ComponentDirContext.java:394)
[14] com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search (PartialCompositeDirContext.java:376)
[15] com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search (PartialCompositeDirContext.java:358)
[16] Main.search (Main.java:171)
[17] Main.inner (Main.java:125)
[18] Main.run (Main.java:107)
[19] java.lang.Thread.run (Thread.java:595)

The stack trace for a thread in the waiting for monitor state:

TestThread999[1] where
[1] com.sun.jndi.ldap.Connection.readReply (Connection.java:408)
[2] com.sun.jndi.ldap.LdapClient.getSearchReply (LdapClient.java:611)
[3] com.sun.jndi.ldap.LdapClient.search (LdapClient.java:534)
[4] com.sun.jndi.ldap.LdapCtx.doSearch (LdapCtx.java:1á944)
[5] com.sun.jndi.ldap.LdapCtx.searchAux (LdapCtx.java:1á806)
[6] com.sun.jndi.ldap.LdapCtx.c_search (LdapCtx.java:1á731)
[7] com.sun.jndi.ldap.LdapCtx.c_search (LdapCtx.java:1á748)
[8] com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search (ComponentDirContext.java:394)
[9] com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search (PartialCompositeDirContext.java:376)
[10] com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search (PartialCompositeDirContext.java:358)
[11] Main.search (Main.java:171)
[12] Main.inner (Main.java:125)
[13] Main.run (Main.java:107)
[14] java.lang.Thread.run (Thread.java:595)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
Attached seperatly
---------- END SOURCE ----------
Posted Date : 2007-01-23 11:22:43.0
Work Around
N/A
Evaluation
The Context object is not synchronized. If mutiple threads are accessing the same
Context, the application has to take care of sychronization explicitly.

In the context's javadoc: http://java.sun.com/javase/6/docs/api/javax/naming/Context.html

there is a note about concurrent access pasted below:

Concurrent Access
A Context instance is not guaranteed to be synchronized against concurrent access by multiple threads. Threads that need to access a single Context instance concurrently should synchronize amongst themselves and provide the necessary locking. Multiple threads each manipulating a different Context instance need not synchronize. Note that the lookup method, when passed an empty name, will return a new Context instance representing the same naming context.

For purposes of concurrency control, a Context operation that returns a NamingEnumeration is not considered to have completed while the enumeration is still in use, or while any referrals generated by that operation are still being followed.
Posted Date : 2008-04-02 18:14:19.0

As it happened upon further investigation it turned out that the customer was actually asking for a 1.4.2 implementation of the ldap readTimeout property. When we sent a verification binary with this property implemented the cu indicated they were no longer interested in pursuing.
Posted Date : 2009-06-24 00:26:24.0
Comments
  
  Include a link with my name & email   

Submitted On 08-AUG-2007
julianozg
Hey there ! Good morning. 

I'd like to know if this problem may occurs in java version 1.4.2 , please send me an email (juliano.z.gomes@hsbc.com.br), I'm having this problem in my production environment.

Thank's a lot !

Regards,
Juliano


Submitted On 28-MAR-2008
johnlon
I am seeing this bug now in two production systems - please can we have an update.

One of these systems has just upgraded from Java 1.4.2 and only since moving to Java 5 have we experienced this issue on either syste,m.


Submitted On 28-MAR-2008
johnlon
Just taken a look at the source code.

The degree of synchronisation in this class means that if the write operation should block for any readon then all readers will also block. 

The flush() operation will block if the write buffer is full and the Ldap server delays reading our message. As soon as the ldap server block our write it means our code cannot process any responses from the Ldap server. 
Therefore, If the reason why the ldap server is blocking our write is because it is trying to send us a message this becomes a multi-process deadlock.

Thoughts?

It seems to me that the first syncronisation block in readReply() is redundant. i.e. this one ....

	synchronized (this) {
		    if (sock == null) {
			throw new ServiceUnavailableException(host + ":" + port +
			    "; socket closed");
		    }
		}

In what way would this code be less thread safe if this were removed? Two of the vars in this block are immutable and the 'sock' is a reference that is merely set to null when closed. Seems there is no need for syncronisation.


Submitted On 17-NOV-2008
johnlon
The 'evaluation' is interesting however it is NOT any kind of fix.

In my environment we do NOT explicitely share the context - so I cannot see the relevance of this 'Evaluation'.

Has sun evaluated whether their implementation shares the context accidentally?

Also can sun comment please on my earlier observations on the implementation.



PLEASE NOTE: JDK6 is formerly known as Project Mustang