Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4654490
Votes 0
Synopsis Volano Mark hang on Linux 7.2 SMP
Category hotspot:runtime_system
Reported Against hopper
Release Fixed
State 11-Closed, duplicate of 4650839, bug
Priority: 3-Medium
Related Bugs 4650839
Submit Date 19-MAR-2002
Description
Please see also 4650839.

Sometimes VM couldn't grab the Threads_lock, Heap_lock or SystemDictionary_lock
and that would cause vmark hang on the client side.

To reproduce:

> java COM.volano.Main
> repeat 1000 java COM.volano.Mark -count 1

It hangs pretty quick (usually in less than 500 COM.volano.Mark runs) on
Redhat 7.2 SMP with product builds. I wasn't able to reproduce the hang 
using Redhat 6.2 SMP or debug builds.

The CPU usage is 0% when VM hangs. It appears that VM couldn't grab one of 
the important system locks (Threads_lock, Heap_lock or SystemDictionary_lock)
with pthread_mutex_lock() call. However, the _owner field of the lock is 0x0.

Looking into the pthread frames that handle the underlying pthread mutex, the
mutex status is non-zero, implying that it is indeed locked by some thread.
By default, LinuxThreads doesn't record the real owner of a mutex unless
the type of mutex is initialized to PTHREAD_MUTEX_ERRORCHECK_NP. I managed
to reproduce the hang with PTHREAD_MUTEX_ERRORCHECK_NP type mutex, the _owner 
field of the pthread mutex is again 0x0. See the following stack trace:

#0  0x40075aa5 in __sigsuspend (set=0x4c5b10c0)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x40037079 in __pthread_wait_for_restart_signal (self=0x4c5b1be0)
    at pthread.c:967
#2  0x40038d39 in __pthread_alt_lock (lock=0x805069c, self=0x4c5b1be0)
    at restart.h:34
#3  0x40035c6e in __pthread_mutex_lock (mutex=0x805068c) at mutex.c:116
#4  0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
    at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
#5  0x40470589 in os::Linux::Event::lock (this=0x8050688)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/os_linux.hpp:137
#6  0x404703ed in Mutex::wait_for_lock_implementation (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.inline.hpp:25
#7  0x403fbc1b in Mutex::wait_for_lock_blocking_implementation (
    this=0x8050660, thread=0x807bf30)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.cpp:89
#8  0x403fae61 in Mutex::lock (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
#9  0x4042aab4 in SystemDictionary::find ()
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#10 0x4042ac1f in SystemDictionary::find_instance_or_array_klass ()
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#11 0x402f7dfb in ciEnv::get_klass_by_name_impl ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#12 0x402f8251 in ciEnv::get_klass_by_index_impl ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#13 0x402f82ef in ciEnv::get_klass_by_index ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
  ... ... ... ...
(gdb) frame 4
#4  0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
    at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
518           int status = pthread_mutex_lock(_mutex);
Current language:  auto; currently c++
(gdb) p *_mutex
$7 = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 2,
  __m_lock = {__status = 1473963040, __spinlock = 0}}
    >>>>   __m_kind == PTHREAD_MUTEX_ERRORCHECK_NP,  __m_owner = 0x0 <<<<
(gdb) frame 8
#8  0x403fae61 in Mutex::lock (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
42              wait_for_lock_blocking_implementation((JavaThread*)thread);
(gdb) p *this
$8 = {<CHeapObj> = {<No data fields>}, _lock_count = 0, _lock_event = 0x8050688,
  _supress_signal = 0, _owner = 0x0,
  _name = 0x4058a636 "SystemDictionary_lock", static INVALID_THREAD = 0x0}

Note from "$7" that __m_kind = 2, which is PTHREAD_MUTEX_ERRORCHECK_NP.
__m_lock.__status is not 0 or 1, but __m_owner == 0x0.

"$8" shows that the (HotSpot) _owner field of SystemDictionary_lock is 0x0.
Work Around
N/A
Evaluation
This hang is caused by race in Linux SMP kernel that duplicate PIDs are
assigned to different threads. Fixed in kernel 2.4.18.

  xxxxx@xxxxx   2002-03-26
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang