Environment
Novell Storage Services (NSS)
Novell NetWare 6.5
Novell Open Enterprise Server 2 (OES 2)
Novell Open Enterprise Server 1 (OES 1)
Novell Open Enterprise Server (Linux based)
Novell Open Enterprise Server (NetWare based)
Situation
During a metamig, or an other process that fully scans a volume, the hosting pool deactivates after a 20444 error on ZID 0, followed by several 20206 errors, the error indicating the pool is disabling.
Apr 3 16:24:09 server01 kernel: NSSLOG ==> [Error] comnPool.c[2520]
Apr 3 16:24:09 server01 kernel: Apr 3, 2009 4:24:09 pm NSS<COMN>-4.10a-xxxx:
Apr 3 16:24:09 server01 kernel: Pool POOL01: System data error 20444(beastTree.c[2180]). Block 0(file block 0)(ZID 0)
Apr 3 16:24:09 server01 kernel: Apr 3, 2009 4:24:09 pm NSS<COMN>-4.10a-xxxx:
Apr 3 16:24:09 server01 kernel: Pool POOL01: System data error 20444(beastTree.c[2180]). Block 0(file block 0)(ZID 0)
Apr 3 16:24:10 server01 kernel: NSSLOG ==> [Error] zfsVolumeData.c[216]
Apr 3 16:24:10 server01 kernel: Apr 3, 2009 4:24:10 pm NSS<ZLSS>-4.10a-1449:
Apr 3 16:24:10 server01 kernel: Error reading VolumeData Block 84156637, status=20206.
Apr 3 16:24:10 server01 kernel: Apr 3, 2009 4:24:10 pm NSS<ZLSS>-4.10a-1449:
Apr 3 16:24:10 server01 kernel: Error reading VolumeData Block 84156637, status=20206.
Resolution
Verify there is no underlaying hardware problem, like faulty hard drives, a failing controller, motherboard, ...
Verify there is a verified, and restorable backup of all volumes in the pool available.
In case all hardware is OK and verified, run a NSS pool rebuild, with the rezid option.
The following example scenario would be to rebuild a pool named POOL01
On Novell Open Enterprise Server, running the NetWare kernel this can be performed by placing the Pool in Maintenance.
nss /PoolMaintenance=POOL01
Then perform the rebuild with the /rezid option.
nss /PoolRebuild=POOL01 /rezid
On Novell Open Enterprise Server, running on the Linux kernel, this can be performed by placing the Pool in Maintenance Mode.
nsscon /PoolMaintenance=POOL01
After that is accomplished, run the following ravsui command:
ravsui -r 0xefffffff rebuild POOL01
Additional Information
Cause:
The ZID, basically the pointer in the MetaData toward a particular place on the disk where the contents of that folder or file is residing, for a File or Folder on a Volume of the pool got corrupted.
Each time this corrupt ZID is used, touched the pool collides, deactivates to prevent further corruption.