Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6477975
Votes 0
Synopsis new proc service function in Solaris 11 crashes dbx
Category dbx:proc_control
Reported Against snv_62 , patch_xx
Release Fixed mars_dev, studio11(Bug ID:2143453) , studio10(patch_01) (Bug ID:2143460)
State 10-Fix Delivered, bug
Priority: 2-High
Related Bugs 6409350 , 6445248 , 6516145 , 6587294 , 6660109 , 6660120 , 5044038
Submit Date 03-OCT-2006
Description
See emails in Comments.
Posted Date : 2006-10-03 23:43:14.0

A new proc service function in Solaris 11 is causing dbx to
crash because of having two implementations of proc service.
See bugid 5044038 and the evaluation for more details.
Posted Date : 2006-10-04 23:13:21.0
Work Around
Thanks to Jürgen Keil on the tools-discuss alias:
Workaround:

LD_PRELOAD a shared library into dbx that provides a dummy implementation
of ps_pbrandname(), so that we don't use libproc.so.1's implementation.

% cat my_ps_pbrandname.c
#include <proc_service.h>

ps_err_e
ps_pbrandname(struct ps_prochandle *ph, char *buf, size_t buf_len)
{
        return PS_ERR;
}
Evaluation
Here is the stack trace I get:
=>[1] Pgetauxval(0xa19008, 0x7e3, 0x400, 0xa1b91e00, 0x82c, 0xa97), at 0xfffffd7ffed3d5ee
  [2] Pbrandname(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed2ecb1
  [3] ps_pbrandname(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed3bff9
  [4] _rd_reset32(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed02697
  [5] rd_reset(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed01a70
  [6] rd_new(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed01b65
  [7] RtldAgent::open(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x76d212
  [8] 0x61d5c5(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x61d5c5
  [9] ProcMgr::start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x61e3d7
  [10] 0x67aeeb(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67aeeb
  [11] 0x67b231(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67b231
  [12] targ_ppi_init(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67a82d
  [13] 0x54a0e2(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54a0e2
  [14] main_debug(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54c327
  [15] 0x5e4083(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x5e4083
  [16] 0x704c1f(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x704c1f
  [17] pdksh_execute(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x70448e
  [18] pdksh_shell(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x6f2df2
  [19] main_cmd_loop(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54d3cc
  [20] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54dd4c


Or with different options:
=>[1] libproc.so.1:Pgetauxval()
  [2] libproc.so.1:Pbrandname()
  [3] libproc.so.1:ps_pbrandname()
  [4] librtld_db.so.1:_rd_reset32()
  [5] librtld_db.so.1:rd_reset()
  [6] librtld_db.so.1:rd_new()
  [7] dbx:RtldAgent::open(Pcs*,const ps_prochandle*)
  [8] 0x61d5c5()
  [9] dbx:ProcMgr::start(Target*,bool,bool,bool)
  [10] 0x67aeeb()
  [11] 0x67b231()
  [12] dbx:targ_ppi_init(Target*,PPIOpts*)
  [13] 0x54a0e2()
  [14] dbx:main_debug(Interp*,char*,char*,char*,unsigned)
  [15] 0x5e4083()
  [16] 0x704c1f()
  [17] dbx:pdksh_execute(Interp*,op*,int)
  [18] dbx:pdksh_shell(Interp*,Source*)
  [19] dbx:main_cmd_loop(Interp*)
  [20] dbx:main()


The libproc library is not supposed to get called by librtld_db.so
This is a problem because we have two implementations of "proc service"
in the same process at the same time.  I filed this bug as bugid 5044038
but it was closed as "will not fix" by the OS group.

I'll take a look and try to figure out what the next step is.
Posted Date : 2006-10-04 21:04:21.0

Here's what happened.
1. Solaris adds a new proc_service function called ps_pbrandname
2. Solaris updated librtld_db to call this new proc service function

As described in 5044038 we have a set of library dependencies like this:

dbx -> libcpc -> libpcxt -> libproc
dbx -> librtld_db

rtld_db calls back to dbx by directly calling globally defined functions.
Both dbx and libproc define a proc service implementation. Meaning they define
a set of global functions.  This was previously okay because all symbols would
be looked for first in dbx.

But now rtld_db is calling ps_pbrandname which is not defined in dbx.
So the search algorithm resolves the symbol to the definition
in libproc.  Since the handle data type expected by libproc is
completely different than the handle type expected by dbx, libproc
proceeds to segfault.

The best way to fix this bug (IMO) is the fix described in 5044038.

The workaround suggested by Mike S. ("do what mdb does") doesn't seem
to apply here because we need an older dbx to work with a newer
librtld_db + new libproc.  There doesn't seem to be any way to make
dbx immune from the same bug happening the next time an addition
is made to proc service by the OS group.

Another workaround is to add a proc service function that returns the 
version of the API implemented by any specific service provider.
That way librtld_db could be taught not to call ps_pbrandname if 
the version returned by ps_pversion (eg) didn't support it.

The only workaround I can see for this is to:
1. define ps_pbrandname in dbx and have it return an error.
2. patch this function into every previous version of dbx 
   that we want to not crash on Solaris 11.

Here is an example of the function that I think should be added to dbx.

ps_err_e
ps_pbrandname(struct ps_prochandle *, char *, size_t)
{
    return(PS_ERR);
}

I'll reassign this to Leonard
Posted Date : 2006-10-04 23:13:21.0

The workaround suggested by Chris has been putback'ed to the dbx nightly. Will try to ask the kernel guys to fix 5044038.
Posted Date : 2006-10-06 00:52:47.0

The fix is in build33.0.
Posted Date : 2006-10-18 22:40:36.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang