Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 5033614
Votes 5
Synopsis ClassLoaders do not get released by GC, causing OutOfMemory in Perm Space
Category hotspot:compiler2
Reported Against 1.4.2_04
Release Fixed mustang(b10), 1.4.2_07(b02) (Bug ID:2120931) , 5.0u1(b05) (Bug ID:2121057)
State 10-Fix Delivered, bug
Priority: 2-High
Related Bugs 4957990
Submit Date 19-APR-2004
Description
 customer  has reported a problem with ClassLoaders not getting 
collected properly, leading to OOM exceptions due to 
perm space being flooded by class definitions. 

This is for  customer 's NetWeaver 04 product, which is at the heart
of  customer 's software strategy. 

 customer  uses complex class loading schemes with their own custom
ClassLoader implementations. This allows them for hot redeployment of 
applications into their J2EE application server, and this is exactly 
where the problem occurs: unloading and reloading of an application
establishes a new class loader instance, and the old one should be
GC'able. This does not happen under all circumstances, leading to the problem
described. 

 customer  has come up with a reproducable test scenario for which we have collected
dumps on Windows (see below). A stand-alone test case is not available since
this only occurs with the full NetWeaver stack in place. A test landscape can be 
provided, however, should the need arise. Hopefully, the dumps already contain
useful information. Analysis of the VM at the point of OOM error using both OptimizeIt
and  customer 's own JVMDI/JVMPI tool Sherlok indicated that the class loader is GC'able,
and still it does not happen. 

Two dump files have been collected and are available
from my server at:

/net/tachyon.germany.sun.com/data/Tmp/CL_Problem_SAP

(or via ftp as guest/guest)

A reproducable test scenario has been identified where three applications
are unloaded and redeployed repeatedly. This is the sequence of events used to 
create the dumps for this scenario:

1. fire up NetWeaver system
2. unload applications first time (there is three apps in question here)
3. get dump_01
4. start applications again (caused the CLs to be recreated)
5. unload applications second time
6. get dump_02
7. start applications again --> OOM

So technically dump_02 should be the interesting one.

To use SA on these: 

If you check the Heap Profile, locate the
com. customer .engine.services.deploy.server.ApplicationLoader
instances (size 3360, count 35). This is the suspect.
in the instance list for this class, scroll all the way down
to

1. 0x16966378  customer .com/pcui_gp~xssfpm
2. 0x16882ea8  customer .com/ess~ben
3. 0x1696ae18  customer .com/pcui_gp~xssutils

(the  customer .com stuff are the names that you can see in the "name"
field of the inspect window).

These are the CLs for the three apps (FPM, Utils, Benefits), and at least
1. and 2. should be GCable at this stage. When doing a liveness analysis,
I get problems with SA which - according to Ken Russell - should be fairly
straightforward to fix in SA (when you know what you're doing :-)

  xxxxx@xxxxx   2004-04-19
Work Around
None found so far. SAP tried different GC algorithms in 1.4.2, but to no avail.
Increasing perm space only increases the time for the problem to occur, it does 
not solve it reliably. 

Most of the behavior is similar to 4957990.  If it is the same a 4957990,
then the workaround is to use a larger perm gen.  Verification of this
is still on going.

An alternative to a larger heap is to use c1.
Evaluation
This looks like the same problem as 4957990.

  xxxxx@xxxxx   2004-08-25

Using the flag -XX:-StackTraceInThrowable prevents class loaders from
being kept alive longer than necessary.  Still investigating
why this is the case.

-------------------------

The C2 runtime uses a field in a thread object, "_exception_oop", to pass an oop between setup_exception_blob() and handle_exception_C().  The oop, which is always a Throwable, usually has a backtrace.  In the customer's case, the backtrace referenced a methodOop that was loaded by a classLoader that was soon to be dead. The _exception_oop field, never overwritten, was the only root
with a path to the classLoader, keeping it from being unloaded.
  xxxxx@xxxxx   10/7/04 23:20 GMT
Comments
  
  Include a link with my name & email   

Submitted On 22-MAR-2005
Guy.Molinar
Was getting the OOM error under 1.4.2_05.   Setting the perm region size to a higher value mitigated the problem.   However, I upgraded to 1.4.2_07 and removed the perm region parameters:


#-XX\:PermSize=128M

#-XX\:MaxPermSize=128M

The problem returns.
 



Submitted On 22-MAR-2005
Guy.Molinar
Was getting the OOM error under 1.4.2_05.   Setting the perm region size to a higher value mitigated the problem.   However, I upgraded to 1.4.2_07 and removed the perm region parameters:


#-XX\:PermSize=128M

#-XX\:MaxPermSize=128M

The problem returns.
 



Submitted On 24-MAR-2005
AttilaSzegedi
I have the exactly same problem - we also have a custom application server with hot-reloading code, and are experiencing stuck class loaders (they're stock java.net.URLClassLoader instances, BTW). After I've been attacking the problem with a profiler for a while, I hit a brick wall - under 1.4.2_07 JVM with JVMPI interface the class loaders are displayed as live objects, but there's no path to roots - they "float" in the live object space on their own. Under 1.5.0_01 JVM using the JVMTI interface, each of the stuck class loaders is marked as being a GC root of type "[Unknown]" (under the YourKit profiler). I even tried to analyze the JVM SCSL source code to find out under what circumstances would an object be reported as "Unknown" root, but didn't get anywhere...


Submitted On 24-MAR-2005
AttilaSzegedi
As a temporary workaround, I'm monitoring the stuck class loaders using phantom references -- if they start piling up, we restart the JVM. Not exactly a stable solution, but the best we can do for now. As it's decoupled by the rest of the system by a JMS queue, an occasional short downtime is fortunately transparent to the rest of the system. I might add that despite many hours of effort, I couldn't reproduce the problem in a smaller testcase than the full-blown system -- another similarity with the SAP's problem.


Submitted On 14-APR-2005
MLPOKNBHUYGV
Problem still exists with server flagged jvm for version 1.5.0_02 so I am sure it still exists with 1.4.2_08.

Experienced on:
OS = Linux
Tomcat 4.1.27


Submitted On 15-APR-2005
rasbold
You may be experiencing the related bug 4957990, which has similar symptoms, but a different root cause. That bug is not yet fixed.


Submitted On 25-MAY-2005
maxkir
Is it possible to get this fix in 1.5.x? 


Submitted On 13-JUN-2005
rasbold
This bug was fixed in 5.0 Update 1.  As mentioned before, if you are seeing similar symptoms, you may be experiencing 4957990.


Submitted On 31-MAY-2007
Hi,

pretty sure this problem occurs also with 1.4.2_09 on HPUX and with SAP 6.40 SP18.

Any news, fixes or further help available?

Best regards,

Michael



PLEASE NOTE: JDK6 is formerly known as Project Mustang