OES Linux Server hang when copying data to NSS volume using SCU

  • 3813435
  • 12-Jun-2007
  • 27-Apr-2012

Environment

Novell Open Enterprise Server (Linux based)

Situation

While copying data to an OES Linux NSS volume using the Server Consolidation Utility (SCU) v4.11, the OES Linux server would periodically hang. At the time of these hangs, the following messages were seen in /var/log/messages:

Apr 27 16:35:49 NovellX1 kernel: <1victim
Apr 27 16:35:49 NovellX1 kernel: Fovicvictim
Apr 27 16:35:49 NovellX1 kernel: Fvictim
Apr 27 16:35:49 NovellX1 kernel: Foundvictim
Apr 27 16:35:49 NovellX1 kernel: Apr 27 16:35:49 NovellX1 kernel: Foundvicvicvictimvictim
Apr 27 16:35:49 NovellX1 kernel: Apr 27 16:35:49 NovellX1 kernel: <1victim
Apr 27 16:35:49 NovellX1 kernel: vvicvictim
Apr 27 16:35:49 NovellX1 kernel: Fovictim
Apr 27 16:35:49 NovellX1 kernel: Found a buffevicvictim
Apr 27 16:35:49 NovellX1 kernel: vvicvictimvictim
Apr 27 16:35:49 NovellX1 kernel: Found a bufvictim
Apr 27 16:35:49 NovellX1 kernel: Fouvictivictim
Apr 27 16:35:49 NovellX1 kernel: Found victim
Apr 27 16:35:49 NovellX1 kernel: victim
Apr 27 16:35:49 NovellX1 kernel: victim
Apr 27 16:35:49 NovellX1 kernel: Fouvictim
Apr 27 16:35:49 NovellX1 kernel: <1vicvictim
Apr 27 16:35:49 NovellX1 kernel: <1victivictim
Apr 27 16:35:49 NovellX1 kernel: Found victim
Apr 27 16:35:49 NovellX1 kernel: Fvvicvicvictimvictvictim
Apr 27 16:35:49 NovellX1 kernel: victim
Apr 27 16:35:49 NovellX1 kernel: Fvicvictim


If LKCD is enabled, and a core is retrieved, the following signature oops message can be seen:

<1>Unable to handle kernel NULL pointer dereference at virtual address 000000a8
<1> printing eip:
<4>f90ec1bb
<1>*pde = 00000000
<1>Oops: 0000 [#1]
<4>SMP
<4>CPU: 0
<4>EIP: 0060:[] Tainted: PF U
<4>EFLAGS: 00010212 (2.6.5-7.283-smp SLES9_SP3_BRANCH-20061129165553)
<4>EIP is at LruDeqLnx+0x8d/0x14a [nss]
<4>eax: 00000000 ebx: dbe03e74 ecx: d36ba674 edx: 0000036f
<4>esi: f3e2a474 edi: 00000001 ebp: 00000001 esp: eb0f7c30
<4>ds: 007b es: 007b ss: 0068
<4>Process ncp2nss (pid: 7713, threadinfo=eb0f6000 task=eb0f5990)
<4>Stack: eb2ef984 c0190b85 00000000 f90ec91b f0000000 f3eb69c0 00000000 eb0f7d10
<4> f3eb69c0 00000000 00000001 eb2ef984 00000000 f3e2a410 f0000000 10000000
<4> d3529e28 f3e2a474 10000000 f94c471e f0000000 00000000 eb0f7e20 ebbeb3d0
<4>Call Trace:
<4> [] igrab+0x35/0x40
<4> [] cacheAlloc+0x116/0x5ad [nss]
<4> [] VIRT_CallKernelSpace+0xfd/0x113 [nsscomn]
<4> [] LB_defaultSignal+0x0/0xbe [nss]
<4> [] VIRT_Write+0x8e8/0x9fc [nsscomn]
<4> [] VIRT_BlockSignalHandler+0x0/0x24 [nsscomn]
<4> [] cacheAllocBufferForUserData+0x28/0x2d [nss]
<4> [] VIRT_BlockSignalHandler+0x0/0x24 [nsscomn]
<4> [] VIRT_ChunkyRead+0x342/0x53a [nsscomn]
<4> [] VIRT_BlockSignalHandler+0x0/0x24 [nsscomn]
<4> [] MSG_Call+0x9c/0xce [nss]
<4> [] zMSG_Call+0x2c/0x45 [nss]
<4> [] LB_StackAllocate+0x8/0xc [nsslibrary]
<4> [] zRead+0xcd/0x22c [nsscomn]
<4> [] sigprocmask+0x6c/0x100
<4> [] mpkEnter+0x2ca/0x2da [linuxmpk]
<4> [] lsa_file_read+0x6f/0xaa [nsslsa]
<4> [] vfs_read+0xc6/0x160
<4> [] sys_pread64+0xdb/0x180
<4> [] sysenter_past_esp+0x52/0x79
<4>
<4>Code: 8b 80 a8 00 00 00 e8 1a 35 06 c7 89 c2 89 d8 85 d2 0f 84 a1
<4><1>done waiting: 1 cpus not responding
<4>Uhhuh. NMI received for unknown reason 31 on CPU 1.
<4>Dumping to block device (8,1) on CPU 0 ...
<4>Dazed and confused, but trying to continue
<4>Do you have a strange power saving mode enabled?

Resolution

This issue is currently being investigated by Novell Development.

Until a code-level resolution is available, disabling "SnapShot - File Level" for the OES Linux NSS volume will workaround this issue. This can be accomplished through editing the properties of the NSS volume using `nssmu`.