Environment
SUSE Linux Enterprise Server 10 Service Pack 3
SUSE Linux Enterprise Server 10 Service Pack 4
SUSE Linux Enterprise Server 11
Novell eDirectory 8.8 for Linux
Situation
ndsd crashes during startup leaving a core in the /var/opt/novell/eDirectory/data/dib directory (Default).
Further analysis of the core shows that ndsd was crashing in glibc in a dl_open function. The stacktrace always looks similar to the following:
In this case the server was patched to the latest in the channel (as of October 2011).
The issue could be duplicated almost everytime ndsd was restarted (80% of the time).
The "problem" versions were found to be:
glibc-2.4-31.77.84.1.x86_64.rpm (June 2011 Build)
glibc-2.4-31.77.86.1.x86_64.rpm (August 2011 Build)
Further analysis of the core shows that ndsd was crashing in glibc in a dl_open function. The stacktrace always looks similar to the following:
#0 0x00002b301b5acd3e in __wait_lookup_done () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0 #1 0x00002b301a7ae107 in add_to_global () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #2 0x00002b301a7ae500 in dl_open_worker () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #3 0x00002b301a7aa3e6 in _dl_catch_error () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #4 0x00002b301a7adcbb in _dl_open () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #5 0x00002b301af4b1fa in dlopen_doit () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2 #6 0x00002b301a7aa3e6 in _dl_catch_error () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #7 0x00002b301af4b58d in _dlerror_run () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2 #8 0x00002b301af4b171 in dlopen@@GLIBC_2.2.5 () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2 #9 0x00002b301ad02a89 in SAL_ModLoad () from /data/edir/cores/scott/09-01-2011/wsymbol/2/opt/novell/eDirectory/lib64/libsal.so.1 #10 0x00002aaaacdc5657 in InitializeExtension (be=0x21517ef0, status=<value optimized out>) at /usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/extensions.cpp:851 #11 ConfigureNLDAPExtensions (be=0x21517ef0, status=<value optimized out>) at /usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/extensions.cpp:420 #12 0x00002aaaacddf58c in ConfigureNewBackend (nonAuthoritativePartitions=0x0) at /usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/nds/ndsconfig.cpp:522 #13 0x00002aaaacdb0f7f in DynamicReconfigTask () at /usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/config.cpp:127 #14 0x00002aaaacdb9e60 in BackgroundThread () at /usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/daemon.cpp:1613 #15 0x0000000000414561 in PoolWorker(void*) () #16 0x00002b301b5ad2a3 in start_thread () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0 #17 0x00002b301b8d642d in clone () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libc.so.6 #18 0x0000000000000000 in ?? ()What changed:
In this case the server was patched to the latest in the channel (as of October 2011).
The issue could be duplicated almost everytime ndsd was restarted (80% of the time).
The "problem" versions were found to be:
glibc-2.4-31.77.84.1.x86_64.rpm (June 2011 Build)
glibc-2.4-31.77.86.1.x86_64.rpm (August 2011 Build)
Resolution
Fix is to update glibc to the following versions (or later):
SLES10 SP3 glibc-2.4-31.77.88.4
SLES10 SP4 glibc-2.4-31.95.1
SLES10 SP3 glibc-2.4-31.77.88.4
SLES10 SP4 glibc-2.4-31.95.1
Additional Information
To see if the core stacktrace looks similar to the one reported in this TID, you can do the following:
gdb /opt/novell/eDirectory/sbin/ndsd <corefile>
So example:
gdb /opt/novell/eDirectory/sbin/ndsd /var/opt/novell/eDirectory/data/dib/core.1234
Once all the symbols load, type "bt" and take a look at the frmaes. (#0, #1 etc are frames)
If the first few frames look similar to the following, it is likely you are running into this issue with glibc:
#0 0x00002b301b5acd3e in __wait_lookup_done () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0
#1 0x00002b301a7ae107 in add_to_global () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#2 0x00002b301a7ae500 in dl_open_worker () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #3 0x00002b301a7aa3e6 in _dl_catch_error ()
One other thing to look for is a frames that look like this:
#13 0x00002aaaacdb0f7f in DynamicReconfigTask () at ...
gdb /opt/novell/eDirectory/sbin/ndsd <corefile>
So example:
gdb /opt/novell/eDirectory/sbin/ndsd /var/opt/novell/eDirectory/data/dib/core.1234
Once all the symbols load, type "bt" and take a look at the frmaes. (#0, #1 etc are frames)
If the first few frames look similar to the following, it is likely you are running into this issue with glibc:
#0 0x00002b301b5acd3e in __wait_lookup_done () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0
#1 0x00002b301a7ae107 in add_to_global () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#2 0x00002b301a7ae500 in dl_open_worker () from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #3 0x00002b301a7aa3e6 in _dl_catch_error ()
One other thing to look for is a frames that look like this:
#13 0x00002aaaacdb0f7f in DynamicReconfigTask () at ...