Environment
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 2
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 2
Situation
Shortly following a nightly reboot, a cluster node was crashing with a kernel OOPS in cacheAllocBufferForUserData.
The backtrace of the kernel core revealed the following stack trace:
crash> bt
PID: 14230 TASK: ffff883f2009a600 CPU: 5 COMMAND: "nss 62"
#0 [ffff883f2009d430] machine_kexec at ffffffff8102bf7e
#1 [ffff883f2009d480] crash_kexec at ffffffff810abe8a
#2 [ffff883f2009d550] oops_end at ffffffff81462638
#3 [ffff883f2009d570] __bad_area_nosemaphore at ffffffff81038745
#4 [ffff883f2009d630] do_page_fault at ffffffff81464b8e
#5 [ffff883f2009d730] page_fault at ffffffff81461665
[exception RIP: cacheAlloc+1207]
RIP: ffffffffa07668c0 RSP: ffff883f2009d7e0 RFLAGS: 00010206
RAX: ffffc9007bf6ded8 RBX: ffffc9007bf6ded8 RCX: ffffffffa07a8758
RDX: ffffc9007bf6de58 RSI: ffffffffa07a87c8 RDI: ffffffffa0674ee8
RBP: ffffc9007bf6ded8 R8: 0000000000000000 R9: 0000000000000000
R10: ffff883f7f19f410 R11: ffff883f6abcc960 R12: 0000000000000000
R13: ffff881880efc4b0 R14: ffffc9007bf6dfa8 R15: ffffc9007bf6dfa8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff883f2009d988] cacheAllocBufferForUserData at ffffffffa0766c79 [nss]
#7 [ffff883f2009d998] ZFSVOL_VOL_getFileBlk at ffffffffa09fcfdd [nsszlss]
#8 [ffff883f2009da38] ROOT_BST_GetFileBlk at ffffffffa08662fc [nsscomn]
#9 [ffff883f2009da48] COMN_GetFileBlkOrHole at ffffffffa080446b [nsscomn]
#10 [ffff883f2009daa8] CM_fetchFileBlk at ffffffffa089584b [nsscomn]
#11 [ffff883f2009dae8] ALGOMGR_fetchStreamBuf at ffffffffa088dfb1 [nsscomn]
#12 [ffff883f2009db78] NSSCCDGetWriteCacheBlock at ffffffffa089b19b [nsscomn]
#13 [ffff883f2009dbb8] CCDGetNewTempBlock at ffffffffa0884f9c [nsscomn]
#14 [ffff883f2009dbc8] CCDAnalyzeFile at ffffffffa088775a [nsscomn]
#15 [ffff883f2009dc68] CCDCompressFile at ffffffffa0888189 [nsscomn]
#16 [ffff883f2009dd08] NWALGO_compressStream at ffffffffa089ac93 [nsscomn]
#17 [ffff883f2009dea8] ALGOMGR_invokeCompAlgo at ffffffffa088e1ff [nsscomn]
#18 [ffff883f2009dee8] CM_activityWorkToDoRun at ffffffffa088d822 [nsscomn]
#19 [ffff883f2009def8] do_work at ffffffffa076a1aa [nss]
#20 [ffff883f2009df28] startThread at ffffffffa063e25f [nsslibrary]
#21 [ffff883f2009df48] kernel_thread_helper at ffffffff81469fe4
crash>
The backtrace of the kernel core revealed the following stack trace:
crash> bt
PID: 14230 TASK: ffff883f2009a600 CPU: 5 COMMAND: "nss 62"
#0 [ffff883f2009d430] machine_kexec at ffffffff8102bf7e
#1 [ffff883f2009d480] crash_kexec at ffffffff810abe8a
#2 [ffff883f2009d550] oops_end at ffffffff81462638
#3 [ffff883f2009d570] __bad_area_nosemaphore at ffffffff81038745
#4 [ffff883f2009d630] do_page_fault at ffffffff81464b8e
#5 [ffff883f2009d730] page_fault at ffffffff81461665
[exception RIP: cacheAlloc+1207]
RIP: ffffffffa07668c0 RSP: ffff883f2009d7e0 RFLAGS: 00010206
RAX: ffffc9007bf6ded8 RBX: ffffc9007bf6ded8 RCX: ffffffffa07a8758
RDX: ffffc9007bf6de58 RSI: ffffffffa07a87c8 RDI: ffffffffa0674ee8
RBP: ffffc9007bf6ded8 R8: 0000000000000000 R9: 0000000000000000
R10: ffff883f7f19f410 R11: ffff883f6abcc960 R12: 0000000000000000
R13: ffff881880efc4b0 R14: ffffc9007bf6dfa8 R15: ffffc9007bf6dfa8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff883f2009d988] cacheAllocBufferForUserData at ffffffffa0766c79 [nss]
#7 [ffff883f2009d998] ZFSVOL_VOL_getFileBlk at ffffffffa09fcfdd [nsszlss]
#8 [ffff883f2009da38] ROOT_BST_GetFileBlk at ffffffffa08662fc [nsscomn]
#9 [ffff883f2009da48] COMN_GetFileBlkOrHole at ffffffffa080446b [nsscomn]
#10 [ffff883f2009daa8] CM_fetchFileBlk at ffffffffa089584b [nsscomn]
#11 [ffff883f2009dae8] ALGOMGR_fetchStreamBuf at ffffffffa088dfb1 [nsscomn]
#12 [ffff883f2009db78] NSSCCDGetWriteCacheBlock at ffffffffa089b19b [nsscomn]
#13 [ffff883f2009dbb8] CCDGetNewTempBlock at ffffffffa0884f9c [nsscomn]
#14 [ffff883f2009dbc8] CCDAnalyzeFile at ffffffffa088775a [nsscomn]
#15 [ffff883f2009dc68] CCDCompressFile at ffffffffa0888189 [nsscomn]
#16 [ffff883f2009dd08] NWALGO_compressStream at ffffffffa089ac93 [nsscomn]
#17 [ffff883f2009dea8] ALGOMGR_invokeCompAlgo at ffffffffa088e1ff [nsscomn]
#18 [ffff883f2009dee8] CM_activityWorkToDoRun at ffffffffa088d822 [nsscomn]
#19 [ffff883f2009def8] do_work at ffffffffa076a1aa [nss]
#20 [ffff883f2009df28] startThread at ffffffffa063e25f [nsslibrary]
#21 [ffff883f2009df48] kernel_thread_helper at ffffffff81469fe4
crash>
Resolution
The solution to this issue was released with the September 2014 Scheduled Maintenance patches.
Cause
Kernel panic was caused due to buffer corruption.