ndsd cores on startup in glibc at dl_open_worker when loading ldap extensions

  • 7009559
  • 12-Oct-2011
  • 12-Mar-2015

Environment


SUSE Linux Enterprise Server 10 Service Pack 3
SUSE Linux Enterprise Server 10 Service Pack 4
SUSE Linux Enterprise Server 11
Novell eDirectory 8.8 for Linux

Situation

ndsd crashes during startup leaving a core in the /var/opt/novell/eDirectory/data/dib directory (Default).

Further analysis of the core shows that ndsd was crashing in glibc in a dl_open function.  The stacktrace always looks similar to the following:

#0  0x00002b301b5acd3e in __wait_lookup_done ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0
#1  0x00002b301a7ae107 in add_to_global ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#2  0x00002b301a7ae500 in dl_open_worker ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#3  0x00002b301a7aa3e6 in _dl_catch_error ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#4  0x00002b301a7adcbb in _dl_open ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#5  0x00002b301af4b1fa in dlopen_doit ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2
#6  0x00002b301a7aa3e6 in _dl_catch_error ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#7  0x00002b301af4b58d in _dlerror_run ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2
#8  0x00002b301af4b171 in dlopen@@GLIBC_2.2.5 ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libdl.so.2
#9  0x00002b301ad02a89 in SAL_ModLoad ()  from
/data/edir/cores/scott/09-01-2011/wsymbol/2/opt/novell/eDirectory/lib64/libsal.so.1
#10 0x00002aaaacdc5657 in InitializeExtension (be=0x21517ef0,   status=<value optimized out>)   at
/usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/extensions.cpp:851
#11 ConfigureNLDAPExtensions (be=0x21517ef0, status=<value optimized out>)   at
/usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/extensions.cpp:420
#12 0x00002aaaacddf58c in ConfigureNewBackend (nonAuthoritativePartitions=0x0)   at
/usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/nds/ndsconfig.cpp:522
#13 0x00002aaaacdb0f7f in DynamicReconfigTask ()   at
/usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/config.cpp:127
#14 0x00002aaaacdb9e60 in BackgroundThread ()   at
/usr/src/packages/BUILD/novell-NDSbase-8.8.6.3/nldap-8.8.6.3/src/daemon.cpp:1613
#15 0x0000000000414561 in PoolWorker(void*) ()
#16 0x00002b301b5ad2a3 in start_thread ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0
#17 0x00002b301b8d642d in clone ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libc.so.6
#18 0x0000000000000000 in ?? ()
What changed:

In this case the server was patched to the latest in the channel (as of October 2011).

The issue could be duplicated almost everytime ndsd was restarted (80% of the time). 
The "problem" versions were found to be:

glibc-2.4-31.77.84.1.x86_64.rpm (June 2011 Build)
glibc-2.4-31.77.86.1.x86_64.rpm (August 2011 Build)

Resolution

Fix is to update glibc to the following versions (or later):

SLES10 SP3 glibc-2.4-31.77.88.4
SLES10 SP4 glibc-2.4-31.95.1

Additional Information

To see if the core stacktrace looks similar to the one reported in this TID, you can do the following:

gdb /opt/novell/eDirectory/sbin/ndsd <corefile>

So example:

gdb /opt/novell/eDirectory/sbin/ndsd /var/opt/novell/eDirectory/data/dib/core.1234

Once all the symbols load, type "bt" and take a look at the frmaes.  (#0, #1 etc are frames)
If the first few frames look similar to the following, it is likely you are running into this issue with glibc:

#0  0x00002b301b5acd3e in __wait_lookup_done ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/libpthread.so.0
#1  0x00002b301a7ae107 in add_to_global ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2
#2  0x00002b301a7ae500 in dl_open_worker ()  from /data/edir/cores/scott/09-01-2011/wsymbol/2/lib64/ld-linux-x86-64.so.2 #3  0x00002b301a7aa3e6 in _dl_catch_error ()

One other thing to look for is a frames that look like this:

#13 0x00002aaaacdb0f7f in DynamicReconfigTask ()   at ...