Environment
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell eDirectory 8.8 for Linux
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell eDirectory 8.8 for Linux
Situation
On Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2 -and- Support Pack 3 servers, it is observed that the servers are crashing in the NDSD process. At the time, specific NSS volumes are being accessed, trustee assignments are being modified, etc.
To be more accurate, the NDSD process crashes every time the Volume trustee file for the specific volume is being updated. This event may be caused by multiple tools or operations, and therefore means this can be causing disruption to services, multiple times per hour/day/etc.
To be more accurate, the NDSD process crashes every time the Volume trustee file for the specific volume is being updated. This event may be caused by multiple tools or operations, and therefore means this can be causing disruption to services, multiple times per hour/day/etc.
Resolution
The root cause for this problem has been identified, and a solution to this problem has been released to the public.
This problem is resolved with the following maintenance updates :
This problem is resolved with the following maintenance updates :
- 'April 2011 Scheduled Maintenance for OES2 SP2'
- 'April 2011 Scheduled Maintenance for OES2 SP3'.
Additional Information
This problem is currently reported on both OES2 SP2 and OES2 SP3 server, running the following NCP versions:
For OES2 SP2, running the following NCP version :
# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine] 2011-01-20_14:38:26_amdbuild14-.2759
[ncp2nss] 2011-01-06_16:52:56_amdbuild7-.1709
[libnrm2ncp] 2011-01-06_16:59:33_amdbuild5-.1709
[ncpcon] 2011-01-06_16:52:51_amdbuild7-.1709
[ncpshell] 2011-01-06_16:53:08_amdbuild7-.1709
[ncptop] 2011-01-06_16:53:04_amdbuild7-.1709
OES Version
Novell Open Enterprise Server 2.0.2 (x86_64)
VERSION = 2.0.2
PATCHLEVEL = 2
BUILD = FCS
Platform
x86_64
For OES2 SP3, running the following NCP version :
# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine] 2010-12-21_02:53:35_amdbuild11-.2637
[ncp2nss] 2010-11-29_16:14:54_amdbuild11-.1658
[libnrm2ncp] 2010-11-29_16:59:34_amdbuild16-.1658
[ncpcon] 2010-11-29_16:14:48_amdbuild11-.1658
[ncpshell] 2010-11-29_16:15:05_amdbuild11-.1658
[ncptop] 2010-11-29_16:15:01_amdbuild11-.1658
OES Version
Novell Open Enterprise Server 2.0.3 (x86_64)
VERSION = 2.0.3
PATCHLEVEL = 3
BUILD
Platform
x86_64
Checking /var/log/messages shows the following occurrences for this particular event :
- on OES2 SP2 the problem is shown as below :
Mar 4 12:06:06 SERVER kernel: ndsd[12959]: segfault at 0000000000000121 rip 00002aaaaab0174d rsp 000000004d27cf60 error 4
- on OES2 SP3 the problem is shown as below:
Jan 19 13:34:39 SERVER kernel: ndsd[31666]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d773f60 error 4
Jan 19 13:46:27 SERVER kernel: ndsd[19943]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d87bf60 error 4
When opening a core that was collected from such a this crash, using the bt (or backtrace) command, the following shows :
#0 0x00002aaaaacab93d in InternalUpdateVolumeTrusteeFile(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#1 0x00002aaaaacacaaa in DircacheTrusteeUpdateEvent(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2 0x00000000004142d1 in PoolWorker(void*) ()
#3 0x00002b7af01b6193 in start_thread () from /lib64/libpthread.so.0
#4 0x00002b7af04def0d in clone () from /lib64/libc.so.6
#5 0x0000000000000000 in ?? ()
For OES2 SP2, running the following NCP version :
# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine] 2011-01-20_14:38:26_amdbuild14-.2759
[ncp2nss] 2011-01-06_16:52:56_amdbuild7-.1709
[libnrm2ncp] 2011-01-06_16:59:33_amdbuild5-.1709
[ncpcon] 2011-01-06_16:52:51_amdbuild7-.1709
[ncpshell] 2011-01-06_16:53:08_amdbuild7-.1709
[ncptop] 2011-01-06_16:53:04_amdbuild7-.1709
OES Version
Novell Open Enterprise Server 2.0.2 (x86_64)
VERSION = 2.0.2
PATCHLEVEL = 2
BUILD = FCS
Platform
x86_64
For OES2 SP3, running the following NCP version :
# /sbin/ncpcon version
... Executing " version"
NCP Server Components
[libncpengine] 2010-12-21_02:53:35_amdbuild11-.2637
[ncp2nss] 2010-11-29_16:14:54_amdbuild11-.1658
[libnrm2ncp] 2010-11-29_16:59:34_amdbuild16-.1658
[ncpcon] 2010-11-29_16:14:48_amdbuild11-.1658
[ncpshell] 2010-11-29_16:15:05_amdbuild11-.1658
[ncptop] 2010-11-29_16:15:01_amdbuild11-.1658
OES Version
Novell Open Enterprise Server 2.0.3 (x86_64)
VERSION = 2.0.3
PATCHLEVEL = 3
BUILD
Platform
x86_64
Checking /var/log/messages shows the following occurrences for this particular event :
- on OES2 SP2 the problem is shown as below :
Mar 4 12:06:06 SERVER kernel: ndsd[12959]: segfault at 0000000000000121 rip 00002aaaaab0174d rsp 000000004d27cf60 error 4
- on OES2 SP3 the problem is shown as below:
Jan 19 13:34:39 SERVER kernel: ndsd[31666]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d773f60 error 4
Jan 19 13:46:27 SERVER kernel: ndsd[19943]: segfault at 0000000000000121 rip00002aaaaab03fde rsp 000000004d87bf60 error 4
When opening a core that was collected from such a this crash, using the bt (or backtrace) command, the following shows :
#0 0x00002aaaaacab93d in InternalUpdateVolumeTrusteeFile(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#1 0x00002aaaaacacaaa in DircacheTrusteeUpdateEvent(int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2 0x00000000004142d1 in PoolWorker(void*) ()
#3 0x00002b7af01b6193 in start_thread () from /lib64/libpthread.so.0
#4 0x00002b7af04def0d in clone () from /lib64/libc.so.6
#5 0x0000000000000000 in ?? ()