PPC64 - After updating to SLES11 SP4 kernel 3.0.101-108.81 system is crashing constantly

  • 7023571
  • 10-Dec-2018
  • 10-Dec-2018

Environment

SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)

Situation

After a PPC64 LPAR has been updated to kernel 3.0.101-108.81-{bigmem,default} the system is crashing constantly with the following console output.

Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfsd binfmt_misc nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse xfs loop ipv6 ipv6_lib ibmveth(X) nx_crypto(X) sg ext3 jbd mbcache dm_mirror dm_region_hash dm_log linear sd_mod crc_t10dif ibmvf
c(X) scsi_transport_fc scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_service_time dm_least_pending dm_queue_length dm_round_robin dm_multipath scsi_dh dm_snapshot dm_mod ibmvscsic(X) scsi_transport_srp scsi
_tgt scsi_mod
Supported: Yes, External
NIP: 0000000000003494 LR: 00007fff8e6b3e40 CTR: 00007fff7c340298
REGS: c00000001e9bfd40 TRAP: 0300   Tainted: G             X  (3.0.101-108.81-bigmem)
MSR: 8000000000001000 <ME>  CR: 22282484  XER: 20000001
DAR: 00007fff8ca96978, DSISR: 42000000
TASK = c000004de130e160[21452] 'jstart' THREAD: c000004d90db0000 CPU: 24
GPR00: 00007fff7c340298 00007fff8ca8b7a0 000000000000cafe 00007fff8ca8c9b0
GPR04: 0000000000000000 0000000000000010 00007fff8ca8d9a0 0000000000000001
GPR08: ffffffffffffffc0 00007fff8ebcf2dc 00007fff8ec9de70 000000000000c0de
GPR12: 00007fff8ecf9678 00007fff8ca968a0 0000000010f78800 00007fff8ed57b26
GPR16: 00007fff8ca8b810 00000000000003d0 0000000010f79c80 0000000010f798b0
GPR20: 00007fff7c3402ec 0000000010f79890 00007fff8eddc0a8 00007fff8ca8c9b0
GPR24: 0000000010f79840 00007fff7c340280 00007fff7c340298 00007fff7c3402cc
GPR28: 00007fff8eddc0ac 0000000000000080 00007fff8ec2d2e0 00007fff8ca8b7a0
NIP [0000000000003494] 0x3494
LR [00007fff8e6b3e40] 0x7fff8e6b3e40
Call Trace:
Instruction dump:
f98d0098 XXXXXXXX XXXXXXXX XXXXXXXX 7d7a02a6 XXXXXXXX XXXXXXXX XXXXXXXX
7d9b02a6 XXXXXXXX XXXXXXXX XXXXXXXX f92d00d8 XXXXXXXX XXXXXXXX XXXXXXXX
Sending IPI to other cpus...
I'm in purgatory
io_event_irq: No ibm,io-events on system! IO Event interrupt disabled.
doing fast boot
Starting multipathd
Creating device nodes with udev
Out of memory: Kill process 608 (udevadm) score 0 or sacrifice child
Killed process 608 (udevadm) total-vm:3520kB, anon-rss:640kB, file-rss:2112kB
boot/04-udev.sh: line 17:   608 Killed                  /sbin/udevadm settle --timeout=$udev_timeout
Out of memory: Kill process 879 (init) score 0 or sacrifice child
Killed process 880 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 881 (init) score 0 or sacrifice child
Killed process 882 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 883 (init) score 0 or sacrifice child
Killed process 884 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 883 (init) score 0 or sacrifice child
Killed process 883 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 885 (init) score 0 or sacrifice child
Killed process 886 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 894 (init) score 0 or sacrifice child
Killed process 894 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 895 (init) score 0 or sacrifice child
Killed process 895 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 896 (init) score 0 or sacrifice child
Killed process 896 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 897 (init) score 0 or sacrifice child
Killed process 897 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 900 (init) score 0 or sacrifice child
Killed process 901 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 902 (init) score 0 or sacrifice child
Killed process 903 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 902 (init) score 0 or sacrifice child
Killed process 902 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Out of memory: Kill process 904 (init) score 0 or sacrifice child
Killed process 904 (init) total-vm:4992kB, anon-rss:1216kB, file-rss:0kB
Kernel panic - not syncing: Out of memory and no killable processes...


Call Trace:
[c0000000165cf5b0] [c000000008014ae8] .show_stack+0x68/0x1b0 (unreliable)
[c0000000165cf660] [c00000000864b838] .panic+0xe0/0x288
[c0000000165cf700] [c0000000081752a0] .out_of_memory+0x380/0x390
[c0000000165cf7f0] [c00000000817cba4] .__alloc_pages_nodemask+0x9d4/0x9f0
[c0000000165cf9b0] [c0000000081c8a8c] .alloc_pages_vma+0x12c/0x2f0
[c0000000165cfa90] [c0000000081a1aa8] .do_wp_page+0x338/0xc20
[c0000000165cfb90] [c00000000863df14] .do_page_fault+0x3f4/0x770
[c0000000165cfe30] [c000000008006024] handle_page_fault+0x20/0xffc
Rebooting in 180 seconds..

Resolution

Reverting back to  kernel 3.0.101-108.77.1 or updating to kernel 3.0.101-108.84.1 fixes the panics.

Feedback service temporarily unavailable. For content questions or problems, please contact Support.