NDSD crashing in INCP::ServiceStreamGroupConnections(StreamGroupStruct*) () with NULL pointer.

  • 7015250
  • 20-Jun-2014
  • 23-Jun-2014

Environment

Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
January 2014 Scheduled Maintenance update

Situation

During regular day to day operation, it was observed that the server is irregularly crashing in NDSD.
The crashes could not be related to a specific series of actions or events, and appeared to be occurring a totally random hours during the day.

A number of these crashes appeared in /var/log/messages as below  :
ndsd[10607]: segfault at 58 ip 00007f312b5bab81 sp 00007f3119998bc0 error 4 in libncpengine.so.0.0.0[7f312b54d000+109000]
ndsd[29717]: segfault at 58 ip 00007f5eb3d03b81 sp 00007f5e9cc31bc0 error 4 in libncpengine.so.0.0.0[7f5eb3c96000+109000]

Analyzing the core, the crash occurred in the function : "INCP::ServiceStreamGroupConnections(StreamGroupStruct*) ()".

Multiple cores files were analyzed, and it turned out the crashes occurred at few different code offset's within the same function.

Back traces of the two cores :
#bt
#0  0x00007f5eb3d03b81 in INCP::ServiceStreamGroupConnections(StreamGroupStruct*) ()
from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#1  0x00007f5eb3d042ba in NCPPollerThread(StreamGroupStruct*) ()    from
/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2  0x000000000041737c in ?? ()
#3  0x00007f5eb6e6b7f6 in sigcancel_handler () from /lib64/libpthread.so.0
#4  0x0000000000000000 in ?? ()
#

#bt
#0  0x00007ff7f9a52e71 in INCP::ServiceStreamGroupConnections(StreamGroupStruct*) ()
   from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#1  0x00007ff7f9a535aa in NCPPollerThread(StreamGroupStruct*) () from
/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2  0x000000000041737c in ?? ()
#3  0x00007ff7fcab87f6 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ff7fc07af8d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
#



Resolution

Global variable locking is now improved so that it cannot accidentally be cleared when it is used by multiple threads.

Cause

A global variable named "ss->receiveBuffer" which is used per connection, suddenly became NULL unexpectedly, and this caused NDSD to crash.

In the code at some locations this variable was properly protected by a lock, but few other locations were found where it was not protected by a lock, which also was the code path that was hit when the system crashed.