NSS volume Deactivates due to 20444 on ZID 0

  • 7003118
  • 27-Apr-2009
  • 27-Apr-2012

Environment

Novell Storage Services (NSS)
Novell NetWare 6.5
Novell Open Enterprise Server 2 (OES 2)
Novell Open Enterprise Server 1 (OES 1)
Novell Open Enterprise Server (Linux based)
Novell Open Enterprise Server (NetWare based)

Situation

During a metamig, or an other process that fully scans a volume, the hosting pool deactivates after a 20444 error on ZID 0, followed by several 20206 errors, the error indicating the pool is disabling.

Apr 3 16:24:09 server01 kernel: NSSLOG ==> [Error] comnPool.c[2520]
Apr 3 16:24:09 server01 kernel: Apr 3, 2009 4:24:09 pm NSS<COMN>-4.10a-xxxx:
Apr 3 16:24:09 server01 kernel: Pool POOL01: System data error 20444(beastTree.c[2180]). Block 0(file block 0)(ZID 0)
Apr 3 16:24:10 server01 kernel: NSSLOG ==> [Error] zfsVolumeData.c[216]
Apr 3 16:24:10 server01 kernel: Apr 3, 2009 4:24:10 pm NSS<ZLSS>-4.10a-1449:
Apr 3 16:24:10 server01 kernel: Error reading VolumeData Block 84156637, status=20206.
 

Resolution

Verify there is no underlaying hardware problem, like faulty hard drives, a failing controller, motherboard, ...
Verify there is a verified, and restorable backup of all volumes in the pool available.
In case all hardware is OK and verified, run a NSS pool rebuild, with the rezid option.
The following example scenario would be to rebuild a pool named POOL01
 
On Novell Open Enterprise Server, running the NetWare kernel this can be performed by placing the Pool in Maintenance.
        nss /PoolMaintenance=POOL01
Then perform the rebuild with the /rezid option.
        nss /PoolRebuild=POOL01 /rezid
 
 
On Novell Open Enterprise Server, running on the Linux kernel, this can be performed by placing the Pool in Maintenance Mode.
        nsscon /PoolMaintenance=POOL01
After that is accomplished, run the following ravsui command:
        ravsui -r 0xefffffff rebuild POOL01

Additional Information

Cause:

The ZID, basically the pointer in the MetaData toward a particular place on the disk where the contents of that folder or file is residing, for a File or Folder on a Volume of the pool got corrupted.
Each time this corrupt ZID is used, touched the pool collides, deactivates to prevent further corruption.