Page Fault abend in SCSIHD.CDM

  • 7000022
  • 08-Apr-2008
  • 26-Apr-2012

Environment

Novell NetWare 6.5 Support Pack 7
Hewlett Packard Proliant DL580 G3 Servers
Emulex LPFC controller(s)

Hitachi DF600F SAN

Situation

As part of the SAN Failover and Failback testing procedures, the Fiber Channel cable was pulled from one of the Fiber Channel controllers. This did not result in the expected path Fail-over, but resulted in the server abending after approximately 10 seconds.

Resolution

The problem was identified to be in SCSIHD.CDM that was trying to de-reference a structure for a device that no longer exists.

A fix is currently being tested and is planned to be released for a future Support Pack.

Additional Information

  • Testing various LPFC driver versions and various LPFC controller firmware revisions, the problem was recurring.
  • Testing different SCSIHD.CDM versions and various command line options, the problem was recurring.
  • Testing using Hewlett Packard's CPQSHD.CDM the SAN was not recognized.

When loading SCSIHD.CDM with the /DEBUG parameter the loggerscreen would show information as displayed below :

SCSI Inquiry - deviceid=0 npaBusID=6 flags=3                            
SCSI Inquiry - deviceid=0 npaBusID=E flags=3
SCSI Inquiry - deviceid=0 npaBusID=16 flags=3
Attempting to set mutex. device=0,mutex=FE141D80,mutexName=1515071E
Mutex set. device=0
Retry TUR command succeeded! hacbCC=80000000 dev=0
Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=7B5CA0. dev=0
READCAPACITY SUCCEEDED FOR DEVICE 0
CAPACITY=0xF9FFFFFF sectors * 200
Queue Frozen in Blocking Execute called from ReadAndSetGeometry. scsi=7B5CA0. dev=0
Queue Unfrozen. scsi=7B5CA0. dev=0
Mount ObjUpd! flag=9,status=0,activate=1,device=7B5CA0(0)
Ret Mount ObjUpd! flag=9,status=0,activate=1,device=7B5CA0(0)
Mutex cleared. device=0
Attempting to set mutex. device=1,mutex=FE141E40,mutexName=1515071E
Mutex set. device=1
Retry TUR command succeeded! hacbCC=80000000 dev=1
Queue Frozen in Blocking Execute called from DoReadCapacity.csi=7B5DE0. dev=1
READCAPACITY SUCCEEDED FOR DEVICE 1
CAPACITY=0x3E7FFFFF sectors * 200
Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=7B5DE0. dev=1
Queue Unfrozen. scsi=7B5DE0. dev=1
Mount ObjUpd! flag=9,status=0,activate=1,device=7B5DE0(1)
Ret Mount ObjUpd! flag=9,status=0,activate=1,device=7B5DE0(1)
Mutex cleared. device=1
Attempting to set mutex.
device=10000000,mutex=FE168480,mutexName=1515071E
Mutex set. device=10000000
Retry TUR command succeeded! hacbCC=80000000 dev=10000000
Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=9C0C0. dev=10000000
READCAPACITY SUCCEEDED FOR DEVICE 10000000
CAPACITY=0xF9FFFFFF sectors * 200
Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=9C0C0. dev=10000000
Queue Unfrozen. scsi=9C0C0. dev=10000000
Mount ObjUpd! flag=9,status=0,activate=1,device=9C0C0(10000000)
Ret Mount ObjUpd! flag=9,status=0,activate=1,device=9C0C0(10000000)
Mutex cleared. device=10000000
Attempting to set mutex.
device=10000001,mutex=FE168540,mutexName=1515071E
Mutex set. device=10000001
Retry TUR command succeeded! hacbCC=80000000 dev=10000001
Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=9C200. dev=10000001
READCAPACITY SUCCEEDED FOR DEVICE 10000001
CAPACITY=0x3E7FFFFF sectors * 200
Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=9C200. dev=10000001
Queue Unfrozen. scsi=9C200. dev=10000001
Mount ObjUpd! flag=9,status=0,activate=1,device=9C200(10000001)
Ret Mount ObjUpd! flag=9,status=0,activate=1,device=9C200(10000001)
Mutex cleared. device=10000001
Attempting to set mutex. device=1,mutex=FE141B40,mutexName=1515071E
Mutex set. device=1
Retry TUR command succeeded! hacbCC=80000000 dev=1
Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=FEF680. dev=1
READCAPACITY SUCCEEDED FOR DEVICE 1
CAPACITY=0x10F539BF sectors * 200
Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=FEF680. dev=1
Queue Unfrozen. scsi=FEF680. dev=1
Mount ObjUpd! flag=9,status=0,activate=1,device=FEF680(1)
Ret Mount ObjUpd! flag=9,status=0,activate=1,device=FEF680(1)
Mutex cleared. device=1
GN_CallBack error! Error=80020000.Sense=0.Ctl=2,Dev=1. MSG=8F268D00
Callback. Msg Queued MSG=8F268D00
GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8975ED80
Callback. Msg Queued MSG=8975ED80
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F260E80
Callback. Msg Queued MSG=8F260E80
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8975EB80
Callback. Msg Queued MSG=8975EB80
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F260E00
Callback. Msg Queued MSG=8F260E00
GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F268B00
Callback. Msg Queued MSG=8F268B00
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F268A80
Callback. Msg Queued MSG=8F268A80
GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F260D80
Callback. Msg Queued MSG=8F260D80
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F268A00
Callback. Msg Queued MSG=8F268A00
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8975EB00
Callback. Msg Queued MSG=8975EB00
GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F265300
Callback. Msg Queued MSG=8F265300
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265280
Callback. Msg Queued MSG=8F265280
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265200
Callback. Msg Queued MSG=8F265200
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265180
Callback. Msg Queued MSG=8F265180
GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265100
Callback. Msg Queued MSG=8F265100
GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F268880
Callback. Msg Queued MSG=8F268880
GN_CallBack error! Error=80070000.Sense=0.Ctl=0,Dev=1. MSG=95999400
Callback. Msg Queued MSG=95999400
AEN Event: Device disappeared. scsi=7B5CA0, dev=0
Queue Unfrozen. scsi=7B5CA0. dev=0
SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5CA0(0)
Ret from SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5CA0(0)
AEN Event: Device disappeared. scsi=7B5DE0, dev=1
Queue Unfrozen. scsi=7B5DE0. dev=1
SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5DE0(1)
CDMAbort.Aborted msg completed! MSG=8975EB00
CDMAbort.Aborted msg completed! MSG=8975EB80
CDMAbort.Aborted msg completed! MSG=8975ED80
CDMAbort.Aborted msg completed! MSG=8F260D80
CDMAbort.Aborted msg completed! MSG=8F260E00
CDMAbort.Aborted msg completed! MSG=8F260E80
CDMAbort.Aborted msg completed! MSG=8F265100
CDMAbort.Aborted msg completed! MSG=8F265180
CDMAbort.Aborted msg completed! MSG=8F265200
CDMAbort.Aborted msg completed! MSG=8F265280
CDMAbort.Aborted msg completed! MSG=8F265300
CDMAbort.Aborted msg completed! MSG=8F268880
CDMAbort.Aborted msg completed! MSG=8F268A00
CDMAbort.Aborted msg completed! MSG=8F268A80
CDMAbort.Aborted msg completed! MSG=8F268B00
CDMAbort.Aborted msg completed! MSG=8F268D00
CDMAbort.Aborted msg completed! MSG=95999400
Ret from SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5DE0(1)
Queue Frozen in Blocking Execute called from DoStartDevice.
scsi=7B5DE0. dev=1
Retry Start command failed! hacbCC=80070000,sense=0,dev=1
SCSI Inquiry - deviceid=10 npaBusID=E flags=2
Deleting SCSI! scsi=7B5DE0, dev=1
Mutex freed for device=1,mutex=FE141E40,mutexName=0
SCSI Inquiry - deviceid=F npaBusID=E flags=2
Deleting SCSI! scsi=7B5CA0, dev=0
Mutex freed for device=0,mutex=FE141D80,mutexName=0
SCSI Inquiry - deviceid=0 npaBusID=E flags=3
Queue Frozen in Blocking Execute called from DoStartDevice.scsi=7B5DE0. dev=1099290
Retry Start command failed! hacbCC=80070000,sense=0,dev=1099290
Queue Frozen in Blocking Execute called from DoStartDevice.scsi=7B5DE0. dev=1099290
Retry Start command failed! hacbCC=80070000,sense=0,dev=1099290



The abend logging as captured in the abend.log showed the following information :

Novell Open Enterprise Server, NetWare 6.5PVER: 6.50.07
Server SERVER1 halted Tuesday, 4 March 2008 13:36:33,309 Abend 1 on P00: Server-5.70.07: Page Fault Processor Exception (Error code 00000000)
Registers:
CS = 0008 DS = 0023 ES = 0023 FS = 0023 GS = 0023 SS = 0010 EAX = 25F44800 EBX = 00209D98 ECX = 25F44800 EDX = 00000000 ESI = 00209D98 EDI = 01E509F5 EBP = 8FA23EFC ESP = 8FA23EF4 EIP = 8A1EDC0B FLAGS = 00010246 8A1EDC0B 8A82DC000000 MOV AL, [EDX+000000DC]=? EIP in SCSIHD.CDM at code start +00001C0Bh Access Location: 0x000000DCThe violation occurred while processing the following instruction:8A1EDC0B 8A82DC000000 MOV AL, [EDX+000000DC]8A1EDC11 8845FC MOV [EBP-04], AL8A1EDC14 8A45FC MOV AL, [EBP-04]8A1EDC17 C9 LEAVE 8A1EDC18 C3 RET 8A1EDC19 0000 ADD [EAX], AL8A1EDC1B 0000 ADD [EAX], AL8A1EDC1D 0000 ADD [EAX], AL8A1EDC1F 005589 ADD [EBP-77], DL8A1EDC22 E58B IN EAX, FFFFFF8B
Running process: Server 16 ProcessThread Owned by NLM: SERVER.NLMStack pointer: 8FA23E30OS Stack limit: 8FA1C040Scheduling priority: 67371008Wait state: 5050090 Wait for interruptStack: --CCCCCCCC ? --CCCCCC00 ? --8FA23F18 ? 8A1ED637 (SCSIHD.CDM|ErrorHandlerEvent+157) --25F44800 ? --CCCCCCCC ? --00000001 (LOADER.NLM|KernelAddressSpace+1) --92C0C600 ? --25F44800 ? --00209DA0 ? 897F9AE5 (NWPA.NLM|NPAThreadRoutine+3D) --25F44800 ? --00000046 (LOADER.NLM|KernelAddressSpace+46) 0010BED6 (LOADER.NLM|kSpinUnlockRestore+26) --00000046 (LOADER.NLM|KernelAddressSpace+46) 897FD78D (NWPA.NLM|FastCompleteHACBRoutine+95) --0020900C ? --00000046 (LOADER.NLM|KernelAddressSpace+46) --00209020 ? --01081A00 ? --00209DA0 ? --01E509F5 ? --00209D98 ? --00209D98 ? 0036CE32 (SERVER.NLM|CallAESRoutineWithEsiSet+A) --00209D98 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00209DA0 ? 0026B852 (SERVER.NLM|WTDFrontEndHandler+62) --00209D98 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) 0026B7F0 (SERVER.NLM|WTDFrontEndHandler+0) 0036CDA2 (SERVER.NLM|StartWorkToDo+23) --00209DA0 ? --01E509F5 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --FE10EBC0 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) 0021C572 (SERVER.NLM|WorkerThread+50E) --00209DA0 ? --B86673F5 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --928C1360 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --928C1360 ? 00229B78 (SERVER.NLM|TcoNewSystemThreadEntryPoint+40) --928C1360 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --34343434 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --8FA24668 ? --8FA2467C ? --8FA246AA ? --8FA246B6 ? --8FA246F6 ? --8FA246FF ? --8FA24708 ? --8FA24714 ? --8FA24721 ? --8FA2472C ? --8FA24734 ? --8FA2473D ? --8FA24749 ? --8FA24751 ? --8FA2475B ? --8FA247B6 ? --8FA24811 ? --8FA2486F ? --8FA248CE ? --8FA2492B ? --8FA24985 ? --8FA249E0 ? --8FA24A3C ? --8FA24A98 ? --8FA24AA6 ? --8FA24AAE ? --8FA24AC8 ? --8FA24ADE ? --8FA24AF0 ? --8FA24B00 ? --8FA24B12 ? --8FA24B24 ? --8FA24B30 ?Additional Information: The CPU encountered a problem executing code in SCSIHD.CDM. The problem may be in that module or in data passed to that module by a process owned by SERVER.NLM.