Environment
Novell Open Enterprise Server 2 (OES 2) Linux
Novell Storage Services (NSS)
Novell Storage Services (NSS)
Situation
Errors repeatedly appearing in /var/log/messages
Then the pool will deactivate. When the pool is activated, it will deactivate soon after, usually within a few seconds to a couple of minutes.
Feb 10 18:28:25 oes_srv1 kernel: NSSLOG ==> [Error] comnPool.c[2554]
Feb 10 18:28:25 oes_srv1 kernel: Feb 10, 2011 6:28:25 pm NSS<COMN>-4.12a-xxxx:
Feb 10 18:28:25 oes_srv1 kernel: Pool CHIFS14: System data error 20407(nameTree.c[499]). Block 0(file block 0)(ZID 0)
Feb 10 18:28:25 oes_srv1 kernel: err=20801 comnVol.c[894]
Feb 10 18:28:25 oes_srv1 kernel: err=20801 comnVol.c[894]
Feb 10 18:28:25 oes_srv1 kernel: err=20801 comnVol.c[894]
Feb 10 18:28:25 oes_srv1 kernel: err=20801 comnVol.c[894]
Feb 10 18:28:28 oes_srv1 kernel: NSSLOG ==> [Error] zlssMSAP.c[538]
Feb 10 18:28:28 oes_srv1 kernel: Feb 10, 2011 6:28:28 pm NSS<ZLSS>-4.12a-xxxx:
Feb 10 18:28:28 oes_srv1 kernel: MSAP: Pool "CHIFS14" read error 20206.
Feb 10 18:28:28 oes_srv1 kernel:
Feb 10 18:28:28 oes_srv1 kernel: NSSLOG ==> [MSAP] comnLog.c[201]
Feb 10 18:28:28 oes_srv1 kernel: Pool "CHIFS14" - Read error(20206).
Feb 10 18:28:38 oes_srv1 afptcpd[17746]: [error] Failed to get file info from ZID for volume - USERS02. Error - 20051 <0x4e53>
Feb 10 18:28:38 oes_srv1 afptcpd[17746]: [error] zAFPOpenRoot: Failed to open by ZID. Error - 20051 <0x4e53>
Feb 10 18:28:38 oes_srv1 afptcpd[17746]: [error] FPGetFileDirParms: Failed to get file object info. Error - 20051 <0x4e53>
Feb 10 18:28:38 oes_srv1 afptcpd[17746]: [error] zAFPOpenRoot: Failed to open by ZID. Error - 20051 <0x4e53>
Feb 10 18:28:38 oes_srv1 afptcpd[17746]: [error] FPGetFileDirParms: Failed to get file object info. Error - 20051 <0x4e53>
Feb 10 18:28:40 oes_srv1 afptcpd[17746]: [error] zAFPOpenRoot: Failed to open by ZID. Error - 20051 <0x4e53>
Feb 10 18:28:40 oes_srv1 afptcpd[17746]: [error] FPGetFileDirParms: Failed to get file object info. Error - 20051 <0x4e53>
Feb 10 18:28:40 oes_srv1 afptcpd[17746]: [error] zAFPOpenRoot: Failed to open by ZID. Error - 20051 <0x4e53>
Feb 10 18:28:40 oes_srv1 afptcpd[17746]: [error] FPGetFileDirParms: Failed to get file object info. Error - 20051 <0x4e53>...etc
Then the pool will deactivate. When the pool is activated, it will deactivate soon after, usually within a few seconds to a couple of minutes.
Resolution
Fixed in OES2SP3 and OES11.
Workaround: Purge the pool using ravsui; e.g.
ravsui --purge-deleted-files rebuild <poolname>
Workaround: Purge the pool using ravsui; e.g.
ravsui --purge-deleted-files rebuild <poolname>
Additional Information
This specific case of these errors is caused by attempting to purge an NSS Pool when there is a specific type of NSS metadata corruption.
When troubleshooting this problem, it needs to be determined if a purge is running. A purge can be run in a number of places:
When troubleshooting this problem, it needs to be determined if a purge is running. A purge can be run in a number of places:
- By an administrator, via ncpcon
- By a user, via Novell Client
- By the server's auto-purge mechanism: When an NSS Pool becomes full, an automatic purge will start in order to free up space
- Use nssmu or iManager to establish if a pool is getting full
- As a short-term measure, the following NSS parameters can be used within nsscon to delay the threshold at which the auto-purge starts:
- nss /PoolHighWaterMark=poolname:Percent/MB/GB
- nss /PoolLowWaterMark=poolname:Percent/MB/GB
- For example, the following commands will configure auto-purge to start when the p_users pool has only 5% free space and it will continue to run until there is 10% free space:
- nss /PoolHighWaterMark=p_users:10%
- nss /PoolLowWaterMark=p_users:5%
- For further details, refer to the Novell Documentation at https://www.novell.com/documentation/oes2/