OES2 SP2 and SP3 - NDSD crashes in NCP when updating the Volume trustee file

  • 7007927
  • 17-Feb-2011
  • 08-Nov-2012

Environment

Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell eDirectory 8.8 for Linux

Situation

On Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2 -and- Support Pack 3 servers, it is observed that the servers are crashing in the NDSD process.  At the time, specific NSS volumes are being accessed, trustee assignments are being modified, etc.

To be more accurate, the NDSD process crashes every time the Volume trustee file for the specific volume is being updated. This event may be caused by multiple tools or operations, and therefore means this can be causing disruption to services, multiple times per hour/day/etc.


Resolution

The root cause for this problem has been identified, and a solution to this problem has been released to the public.

This problem is resolved with the following maintenance updates :
  • 'April 2011 Scheduled Maintenance for OES2 SP2'
  • 'April 2011 Scheduled Maintenance for OES2 SP3'.

Additional Information

This problem is currently reported on both OES2 SP2 and OES2 SP3 server, running the following NCP versions:

For OES2 SP2, running the following NCP version :

# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine]    2011-01-20_14:38:26_amdbuild14-.2759
[ncp2nss]    2011-01-06_16:52:56_amdbuild7-.1709
[libnrm2ncp]    2011-01-06_16:59:33_amdbuild5-.1709
[ncpcon]    2011-01-06_16:52:51_amdbuild7-.1709
[ncpshell]    2011-01-06_16:53:08_amdbuild7-.1709
[ncptop]    2011-01-06_16:53:04_amdbuild7-.1709

OES Version
    Novell Open Enterprise Server 2.0.2 (x86_64)
    VERSION = 2.0.2
    PATCHLEVEL = 2
    BUILD = FCS

Platform
    x86_64


For OES2 SP3, running the following NCP version :

# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine]    2010-12-21_02:53:35_amdbuild11-.2637
[ncp2nss]    2010-11-29_16:14:54_amdbuild11-.1658
[libnrm2ncp]    2010-11-29_16:59:34_amdbuild16-.1658
[ncpcon]    2010-11-29_16:14:48_amdbuild11-.1658
[ncpshell]    2010-11-29_16:15:05_amdbuild11-.1658
[ncptop]    2010-11-29_16:15:01_amdbuild11-.1658

OES Version
    Novell Open Enterprise Server 2.0.3 (x86_64)
    VERSION = 2.0.3
    PATCHLEVEL = 3
    BUILD

Platform
    x86_64


Checking /var/log/messages shows the following occurrences for this particular event :

- on OES2 SP2 the problem is shown as below :
Mar  4 12:06:06 SERVER kernel: ndsd[12959]: segfault at 0000000000000121 rip 00002aaaaab0174d rsp 000000004d27cf60 error 4

- on OES2 SP3 the problem is shown as below:
Jan 19 13:34:39 SERVER kernel: ndsd[31666]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d773f60 error 4
Jan 19 13:46:27 SERVER kernel: ndsd[19943]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d87bf60 error 4


When opening a core that was collected from such a this crash, using the bt (or backtrace) command, the following shows :

#0  0x00002aaaaacab93d in InternalUpdateVolumeTrusteeFile(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#1  0x00002aaaaacacaaa in DircacheTrusteeUpdateEvent(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2  0x00000000004142d1 in PoolWorker(void*) ()
#3  0x00002b7af01b6193 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b7af04def0d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()