Environment
Novell NetWare 6.5 Support Pack 7
Hewlett Packard Proliant DL580 G3 Servers
Emulex LPFC controller(s)
Hitachi DF600F SAN
Hewlett Packard Proliant DL580 G3 Servers
Emulex LPFC controller(s)
Hitachi DF600F SAN
Situation
As part of the SAN Failover and Failback testing procedures, the Fiber Channel cable was pulled from one of the Fiber Channel controllers. This did not result in the expected path Fail-over, but resulted in the server abending after approximately 10 seconds.
Resolution
The problem was identified to be in SCSIHD.CDM that was trying to de-reference a structure for a device that no longer exists.
A fix is currently being tested and is planned to be released for a future Support Pack.
A fix is currently being tested and is planned to be released for a future Support Pack.
Additional Information
- Testing various LPFC driver versions and various LPFC controller firmware revisions, the problem was recurring.
- Testing different SCSIHD.CDM versions and various command line options, the problem was recurring.
- Testing using Hewlett Packard's CPQSHD.CDM the SAN was not recognized.
When loading SCSIHD.CDM with the /DEBUG parameter the loggerscreen would show information as displayed below :
SCSI Inquiry - deviceid=0 npaBusID=6 flags=3 SCSI Inquiry - deviceid=0 npaBusID=E flags=3 SCSI Inquiry - deviceid=0 npaBusID=16 flags=3 Attempting to set mutex. device=0,mutex=FE141D80,mutexName=1515071E Mutex set. device=0 Retry TUR command succeeded! hacbCC=80000000 dev=0 Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=7B5CA0. dev=0 READCAPACITY SUCCEEDED FOR DEVICE 0 CAPACITY=0xF9FFFFFF sectors * 200 Queue Frozen in Blocking Execute called from ReadAndSetGeometry. scsi=7B5CA0. dev=0 Queue Unfrozen. scsi=7B5CA0. dev=0 Mount ObjUpd! flag=9,status=0,activate=1,device=7B5CA0(0) Ret Mount ObjUpd! flag=9,status=0,activate=1,device=7B5CA0(0) Mutex cleared. device=0 Attempting to set mutex. device=1,mutex=FE141E40,mutexName=1515071E Mutex set. device=1 Retry TUR command succeeded! hacbCC=80000000 dev=1 Queue Frozen in Blocking Execute called from DoReadCapacity.csi=7B5DE0. dev=1 READCAPACITY SUCCEEDED FOR DEVICE 1 CAPACITY=0x3E7FFFFF sectors * 200 Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=7B5DE0. dev=1 Queue Unfrozen. scsi=7B5DE0. dev=1 Mount ObjUpd! flag=9,status=0,activate=1,device=7B5DE0(1) Ret Mount ObjUpd! flag=9,status=0,activate=1,device=7B5DE0(1) Mutex cleared. device=1 Attempting to set mutex.
device=10000000,mutex=FE168480,mutexName=1515071EMutex set. device=10000000 Retry TUR command succeeded! hacbCC=80000000 dev=10000000 Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=9C0C0. dev=10000000 READCAPACITY SUCCEEDED FOR DEVICE 10000000 CAPACITY=0xF9FFFFFF sectors * 200 Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=9C0C0. dev=10000000 Queue Unfrozen. scsi=9C0C0. dev=10000000 Mount ObjUpd! flag=9,status=0,activate=1,device=9C0C0(10000000) Ret Mount ObjUpd! flag=9,status=0,activate=1,device=9C0C0(10000000) Mutex cleared. device=10000000 Attempting to set mutex.
device=10000001,mutex=FE168540,mutexName=1515071EMutex set. device=10000001 Retry TUR command succeeded! hacbCC=80000000 dev=10000001 Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=9C200. dev=10000001 READCAPACITY SUCCEEDED FOR DEVICE 10000001 CAPACITY=0x3E7FFFFF sectors * 200 Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=9C200. dev=10000001 Queue Unfrozen. scsi=9C200. dev=10000001 Mount ObjUpd! flag=9,status=0,activate=1,device=9C200(10000001) Ret Mount ObjUpd! flag=9,status=0,activate=1,device=9C200(10000001) Mutex cleared. device=10000001 Attempting to set mutex. device=1,mutex=FE141B40,mutexName=1515071E Mutex set. device=1 Retry TUR command succeeded! hacbCC=80000000 dev=1 Queue Frozen in Blocking Execute called from DoReadCapacity.scsi=FEF680. dev=1 READCAPACITY SUCCEEDED FOR DEVICE 1 CAPACITY=0x10F539BF sectors * 200 Queue Frozen in Blocking Execute called from ReadAndSetGeometry.scsi=FEF680. dev=1 Queue Unfrozen. scsi=FEF680. dev=1 Mount ObjUpd! flag=9,status=0,activate=1,device=FEF680(1) Ret Mount ObjUpd! flag=9,status=0,activate=1,device=FEF680(1) Mutex cleared. device=1 GN_CallBack error! Error=80020000.Sense=0.Ctl=2,Dev=1. MSG=8F268D00 Callback. Msg Queued MSG=8F268D00 GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8975ED80 Callback. Msg Queued MSG=8975ED80 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F260E80 Callback. Msg Queued MSG=8F260E80 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8975EB80 Callback. Msg Queued MSG=8975EB80 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F260E00 Callback. Msg Queued MSG=8F260E00 GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F268B00 Callback. Msg Queued MSG=8F268B00 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F268A80 Callback. Msg Queued MSG=8F268A80 GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F260D80 Callback. Msg Queued MSG=8F260D80 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F268A00 Callback. Msg Queued MSG=8F268A00 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8975EB00 Callback. Msg Queued MSG=8975EB00 GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F265300 Callback. Msg Queued MSG=8F265300 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265280 Callback. Msg Queued MSG=8F265280 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265200 Callback. Msg Queued MSG=8F265200 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265180 Callback. Msg Queued MSG=8F265180 GN_CallBack error! Error=80070000.Sense=0.Ctl=12,Dev=1. MSG=8F265100 Callback. Msg Queued MSG=8F265100 GN_CallBack error! Error=80070000.Sense=0.Ctl=2,Dev=1. MSG=8F268880 Callback. Msg Queued MSG=8F268880 GN_CallBack error! Error=80070000.Sense=0.Ctl=0,Dev=1. MSG=95999400 Callback. Msg Queued MSG=95999400 AEN Event: Device disappeared. scsi=7B5CA0, dev=0 Queue Unfrozen. scsi=7B5CA0. dev=0 SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5CA0(0) Ret from SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5CA0(0) AEN Event: Device disappeared. scsi=7B5DE0, dev=1 Queue Unfrozen. scsi=7B5DE0. dev=1 SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5DE0(1) CDMAbort.Aborted msg completed! MSG=8975EB00 CDMAbort.Aborted msg completed! MSG=8975EB80 CDMAbort.Aborted msg completed! MSG=8975ED80 CDMAbort.Aborted msg completed! MSG=8F260D80 CDMAbort.Aborted msg completed! MSG=8F260E00 CDMAbort.Aborted msg completed! MSG=8F260E80 CDMAbort.Aborted msg completed! MSG=8F265100 CDMAbort.Aborted msg completed! MSG=8F265180 CDMAbort.Aborted msg completed! MSG=8F265200 CDMAbort.Aborted msg completed! MSG=8F265280 CDMAbort.Aborted msg completed! MSG=8F265300 CDMAbort.Aborted msg completed! MSG=8F268880 CDMAbort.Aborted msg completed! MSG=8F268A00 CDMAbort.Aborted msg completed! MSG=8F268A80 CDMAbort.Aborted msg completed! MSG=8F268B00 CDMAbort.Aborted msg completed! MSG=8F268D00 CDMAbort.Aborted msg completed! MSG=95999400 Ret from SetStat ObjUpd! flag=2,status=2,activate=0,device=7B5DE0(1) Queue Frozen in Blocking Execute called from DoStartDevice.
scsi=7B5DE0. dev=1Retry Start command failed! hacbCC=80070000,sense=0,dev=1 SCSI Inquiry - deviceid=10 npaBusID=E flags=2 Deleting SCSI! scsi=7B5DE0, dev=1 Mutex freed for device=1,mutex=FE141E40,mutexName=0 SCSI Inquiry - deviceid=F npaBusID=E flags=2 Deleting SCSI! scsi=7B5CA0, dev=0 Mutex freed for device=0,mutex=FE141D80,mutexName=0 SCSI Inquiry - deviceid=0 npaBusID=E flags=3 Queue Frozen in Blocking Execute called from DoStartDevice.scsi=7B5DE0. dev=1099290 Retry Start command failed! hacbCC=80070000,sense=0,dev=1099290 Queue Frozen in Blocking Execute called from DoStartDevice.scsi=7B5DE0. dev=1099290 Retry Start command failed! hacbCC=80070000,sense=0,dev=1099290
The abend logging as captured in the abend.log showed the following information :
Novell Open Enterprise Server, NetWare 6.5PVER: 6.50.07
Server SERVER1 halted Tuesday, 4 March 2008 13:36:33,309 Abend 1 on P00: Server-5.70.07: Page Fault Processor Exception (Error code 00000000)
Registers: CS = 0008 DS = 0023 ES = 0023 FS = 0023 GS = 0023 SS = 0010 EAX = 25F44800 EBX = 00209D98 ECX = 25F44800 EDX = 00000000 ESI = 00209D98 EDI = 01E509F5 EBP = 8FA23EFC ESP = 8FA23EF4 EIP = 8A1EDC0B FLAGS = 00010246 8A1EDC0B 8A82DC000000 MOV AL, [EDX+000000DC]=? EIP in SCSIHD.CDM at code start +00001C0Bh Access Location: 0x000000DCThe violation occurred while processing the following instruction:8A1EDC0B 8A82DC000000 MOV AL, [EDX+000000DC]8A1EDC11 8845FC MOV [EBP-04], AL8A1EDC14 8A45FC MOV AL, [EBP-04]8A1EDC17 C9 LEAVE 8A1EDC18 C3 RET 8A1EDC19 0000 ADD [EAX], AL8A1EDC1B 0000 ADD [EAX], AL8A1EDC1D 0000 ADD [EAX], AL8A1EDC1F 005589 ADD [EBP-77], DL8A1EDC22 E58B IN EAX, FFFFFF8B
Running process: Server 16 ProcessThread Owned by NLM: SERVER.NLMStack pointer: 8FA23E30OS Stack limit: 8FA1C040Scheduling priority: 67371008Wait state: 5050090 Wait for interruptStack: --CCCCCCCC ? --CCCCCC00 ? --8FA23F18 ? 8A1ED637 (SCSIHD.CDM|ErrorHandlerEvent+157) --25F44800 ? --CCCCCCCC ? --00000001 (LOADER.NLM|KernelAddressSpace+1) --92C0C600 ? --25F44800 ? --00209DA0 ? 897F9AE5 (NWPA.NLM|NPAThreadRoutine+3D) --25F44800 ? --00000046 (LOADER.NLM|KernelAddressSpace+46) 0010BED6 (LOADER.NLM|kSpinUnlockRestore+26) --00000046 (LOADER.NLM|KernelAddressSpace+46) 897FD78D (NWPA.NLM|FastCompleteHACBRoutine+95) --0020900C ? --00000046 (LOADER.NLM|KernelAddressSpace+46) --00209020 ? --01081A00 ? --00209DA0 ? --01E509F5 ? --00209D98 ? --00209D98 ? 0036CE32 (SERVER.NLM|CallAESRoutineWithEsiSet+A) --00209D98 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00209DA0 ? 0026B852 (SERVER.NLM|WTDFrontEndHandler+62) --00209D98 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) 0026B7F0 (SERVER.NLM|WTDFrontEndHandler+0) 0036CDA2 (SERVER.NLM|StartWorkToDo+23) --00209DA0 ? --01E509F5 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --FE10EBC0 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) 0021C572 (SERVER.NLM|WorkerThread+50E) --00209DA0 ? --B86673F5 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --928C1360 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --928C1360 ? 00229B78 (SERVER.NLM|TcoNewSystemThreadEntryPoint+40) --928C1360 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --34343434 ? --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --00000000 (LOADER.NLM|KernelAddressSpace+0) --8FA24668 ? --8FA2467C ? --8FA246AA ? --8FA246B6 ? --8FA246F6 ? --8FA246FF ? --8FA24708 ? --8FA24714 ? --8FA24721 ? --8FA2472C ? --8FA24734 ? --8FA2473D ? --8FA24749 ? --8FA24751 ? --8FA2475B ? --8FA247B6 ? --8FA24811 ? --8FA2486F ? --8FA248CE ? --8FA2492B ? --8FA24985 ? --8FA249E0 ? --8FA24A3C ? --8FA24A98 ? --8FA24AA6 ? --8FA24AAE ? --8FA24AC8 ? --8FA24ADE ? --8FA24AF0 ? --8FA24B00 ? --8FA24B12 ? --8FA24B24 ? --8FA24B30 ?Additional Information: The CPU encountered a problem executing code in SCSIHD.CDM. The problem may be in that module or in data passed to that module by a process owned by SERVER.NLM.