|
Evaluation
|
Here is the stack trace I get:
=>[1] Pgetauxval(0xa19008, 0x7e3, 0x400, 0xa1b91e00, 0x82c, 0xa97), at 0xfffffd7ffed3d5ee
[2] Pbrandname(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed2ecb1
[3] ps_pbrandname(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed3bff9
[4] _rd_reset32(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed02697
[5] rd_reset(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed01a70
[6] rd_new(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed01b65
[7] RtldAgent::open(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x76d212
[8] 0x61d5c5(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x61d5c5
[9] ProcMgr::start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x61e3d7
[10] 0x67aeeb(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67aeeb
[11] 0x67b231(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67b231
[12] targ_ppi_init(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x67a82d
[13] 0x54a0e2(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54a0e2
[14] main_debug(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54c327
[15] 0x5e4083(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x5e4083
[16] 0x704c1f(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x704c1f
[17] pdksh_execute(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x70448e
[18] pdksh_shell(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x6f2df2
[19] main_cmd_loop(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54d3cc
[20] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x54dd4c
Or with different options:
=>[1] libproc.so.1:Pgetauxval()
[2] libproc.so.1:Pbrandname()
[3] libproc.so.1:ps_pbrandname()
[4] librtld_db.so.1:_rd_reset32()
[5] librtld_db.so.1:rd_reset()
[6] librtld_db.so.1:rd_new()
[7] dbx:RtldAgent::open(Pcs*,const ps_prochandle*)
[8] 0x61d5c5()
[9] dbx:ProcMgr::start(Target*,bool,bool,bool)
[10] 0x67aeeb()
[11] 0x67b231()
[12] dbx:targ_ppi_init(Target*,PPIOpts*)
[13] 0x54a0e2()
[14] dbx:main_debug(Interp*,char*,char*,char*,unsigned)
[15] 0x5e4083()
[16] 0x704c1f()
[17] dbx:pdksh_execute(Interp*,op*,int)
[18] dbx:pdksh_shell(Interp*,Source*)
[19] dbx:main_cmd_loop(Interp*)
[20] dbx:main()
The libproc library is not supposed to get called by librtld_db.so
This is a problem because we have two implementations of "proc service"
in the same process at the same time. I filed this bug as bugid 5044038
but it was closed as "will not fix" by the OS group.
I'll take a look and try to figure out what the next step is.
Posted Date : 2006-10-04 21:04:21.0
Here's what happened.
1. Solaris adds a new proc_service function called ps_pbrandname
2. Solaris updated librtld_db to call this new proc service function
As described in 5044038 we have a set of library dependencies like this:
dbx -> libcpc -> libpcxt -> libproc
dbx -> librtld_db
rtld_db calls back to dbx by directly calling globally defined functions.
Both dbx and libproc define a proc service implementation. Meaning they define
a set of global functions. This was previously okay because all symbols would
be looked for first in dbx.
But now rtld_db is calling ps_pbrandname which is not defined in dbx.
So the search algorithm resolves the symbol to the definition
in libproc. Since the handle data type expected by libproc is
completely different than the handle type expected by dbx, libproc
proceeds to segfault.
The best way to fix this bug (IMO) is the fix described in 5044038.
The workaround suggested by Mike S. ("do what mdb does") doesn't seem
to apply here because we need an older dbx to work with a newer
librtld_db + new libproc. There doesn't seem to be any way to make
dbx immune from the same bug happening the next time an addition
is made to proc service by the OS group.
Another workaround is to add a proc service function that returns the
version of the API implemented by any specific service provider.
That way librtld_db could be taught not to call ps_pbrandname if
the version returned by ps_pversion (eg) didn't support it.
The only workaround I can see for this is to:
1. define ps_pbrandname in dbx and have it return an error.
2. patch this function into every previous version of dbx
that we want to not crash on Solaris 11.
Here is an example of the function that I think should be added to dbx.
ps_err_e
ps_pbrandname(struct ps_prochandle *, char *, size_t)
{
return(PS_ERR);
}
I'll reassign this to Leonard
Posted Date : 2006-10-04 23:13:21.0
The workaround suggested by Chris has been putback'ed to the dbx nightly. Will try to ask the kernel guys to fix 5044038.
Posted Date : 2006-10-06 00:52:47.0
The fix is in build33.0.
Posted Date : 2006-10-18 22:40:36.0
|