NSS volume Deactivates due to 20444 on ZID0

  • 7008035
  • 03-Mar-2011
  • 27-Apr-2012

Environment

Novell NetWare 6.5
Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server (NetWare 6.5)
Novell Storage Services (NSS)

Situation

During a metamig, back-up, migration or another process that fully scans a volume, the hosting pool deactivates after a 20444 error on ZID0, followed by several 20206 errors, the error indicating the pool is disabling.

Apr  3 16:24:09 server01 kernel: NSSLOG ==> [Error] comnPool.c[2520]
Apr  3 16:24:09 server01 kernel:      Apr 3, 2009   4:24:09 pm  NSS<COMN>-4.10a-xxxx:
Apr  3 16:24:09 server01 kernel:      Pool POOL01: System data error 20444(beastTree.c[2180]).   Block 0(file block 0)(ZID 0)
Apr  3 16:24:10 server01 kernel: NSSLOG ==> [Error] zfsVolumeData.c[216]
Apr  3 16:24:10 server01 kernel:      Apr 3, 2009   4:24:10 pm  NSS<ZLSS>-4.10a-1449:
Apr  3 16:24:10 server01 kernel:      Error reading VolumeData Block 84156637, status=20206.


Cause:

The ZID, basically the pointer in the MetaData towards a particular place on the disk where the contents of that folder or file is residing, for a File or Folder on a Volume of the pool got corrupted.
Each time this corrupt ZID is used, touched the pool deactivates to prevent further corruption.

For Pools that have been in use for a long time it can be that the ZIDs are "fragmented" and the highest ZID is near the limit (4 billion, 4E9).


In certain cases a volume with a snapshot enabled at file level, with copy on write enabled can cause this phenomena.

Resolution

Verify there is no underlying hardware problem, like faulty hard drives, a failing controller, motherboard, ...
Verify that the Pool does not contain a volume with snapshots enabled at the file level.
Verify you have a verified and restore-able back-up of all volumes in the pool.

In case that there is a snapshot on a volume implement the latest NSS code available or disable the snapshot or copy on write.
In case there is no volume with snapshots enabled and all hardware is OK and verified, run a nss poolrebuild, with the rezid option.


On Novell Open Enterprise Server, running the NetWare kernel this can be performed by placing the Pool in Mainentance.
nss /PoolMaintenance=[POOL]

Then perforn the rebuild with the /rezid option.
nss /PoolRebuild=[POOL] /rezid
 

On Novell Open Enterprise Server, running on the Linux kernel, this can be performed as described in
ReZIDing Volumes in the documentation.