Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4701980
Votes 0
Synopsis HPROF: -Xrunhprof option crashes and restarts S1AS app server
Category hotspot:jvmpi
Reported Against 1.4 , kestrel , 1.3.1_07
Release Fixed 1.4.0_04, 1.4.1_02(Bug ID:2055603) , 1.4.2(mantis) (Bug ID:2055604)
State 10-Fix Delivered, bug
Priority: 2-High
Related Bugs 4701995 , 4727676 , 4804070 , 4325563 , 4835665
Submit Date 13-JUN-2002
Description
Using -Xrunhprof:cpu=samples on S1AS7.0 is not giving the desired profiling output. Also, its resulting in the server being crashed.  Server works fine without the -Xrunhprof.

Update:

Please follow http://siva.sfbay:8080/sunrise/profiler.html for the latest on this issue.
Related Bug Ids:4674906, 4701995
Work Around
N/A
Evaluation
  xxxxx@xxxxx   2002-08-02

There are four problems being tracked by this bug:

1) Dan's Client VM SIGSEGV running vignette when OutOfMemoryError occurs
2) unable to get complete java.hprof.txt output
3) fastdebug VM fails an assertion on app server start up
4) 1.4.0-U1 Server VM SIGSEGV running vignette when OutOfMemoryError occurs

Problem #2 has been resolved:

- delete the -Xrs option
- use 'kill -3' or 'kill -QUIT' on the appservd process to force
  java.hprof.txt to be flushed

Problem #3 will be deferred for now. This issue doesn't impact MDE, but
should be resolved with the iPlanet dev team.

Problem #1 will be deferred in favor of problem #4. Crashes with my
bits are less important than crashes with 1.4.0-U1 bits.

  xxxxx@xxxxx   2002-08-07

The -Xrunhrof:cpu=samples option uses SuspendThread(), GetCallTrace()
and ResumeThread() to gather sample data. JVM/PI requires that GC be
disabled before SuspendThread() is called and GC cannot be enabled
before all threads have been resumed. This means that GC can be
disabled for a long time when there are lots of threads. Combined
with a low memory situation, this can result in OutOfMemoryErrors.

  xxxxx@xxxxx   2002-08-07 (update 1)

The hprof sampler thread grabs disables GC, grabs the hprof_dump_lock
and finally grabs the data_access_lock. GC is disabled for the thread
suspend operations (per JVM/PI spec). The hprof_dump_lock is grabbed
to prevent hprof data from being dumped while actively sampling.
The data_access_lock is grabbed to prevent thread list changes and
to safely save the sampling data (with possible table updates).

GC is disabled before grabbing the hprof_dump_lock to prevent a
deadlock between the sampler thread (trying to disable GC) at
the same time the VM thread is trying post the JVM_SHUT_DOWN event.
See Karen's fix for 4325941.

GC is disabled before grabbing the data_access_lock to prevent a
deadlock between the sampler thread (trying to disable GC) at the
same time the VM thread is trying to post a GC_START event.

The hprof_dump_lock has to be grabbed before the data_access_lock
because the routines that dump the hprof data grab and release the
data_access_lock as needed.

The hprof_dump_lock is hot just because it is held for a long time
(relative to other locks). The data_access_lock is hot because it
is used to control access to so many things.

The hprof_dump_lock can be made less hot by only holding the lock
long enough to set various flags. Local variables can be used to
remember caller sensitive state.

The data_access_lock can be made less hot by splitting off its
protection of the thread lists into a new lock, thread_list_lock.
The data_access_lock will still need to be held to safely save
the sampling data, but that is after the threads are resumed.

By changing the sampler thread to hold the hprof_dump_lock for
less time, there is no longer a race with the VM thread trying to
post the JVM_SHUT_DOWN event. By splitting off thread list control
into the thread_list_lock, the data_access_lock is also held for
less time. This means there is no longer a race with the VM thread
trying to post a GC start event.

The GC disable call can be moved right before the SuspendThread()
calls and the GC enable call can be moved right after the
ResumeThread() calls. This will greatly reduce the amount of time
that GC is disabled, but it probably won't be enough.

Disabling GC doesn't scale as the thread count grows. Also, the
JVM/DI SuspendThread() API does not require GC to be disabled. I
wonder if this JVM/PI spec "requirement" is due to the original
lock order implementation described above.
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang