Environment
Novell Open Enterprise Server 2 (OES 2) Linux
Support Pack 1
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Servers are running Novell Clustering
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Servers are running Novell Clustering
Situation
When migrating a resource to another node, the
resource is comatose
When loading a resource the resource goes comatose
When checking /var/opt/novell/logs/ncs/resource.load.out the following error is being displayed at the end:
Thu Nov 12 21:56:58 CET 2009
++ IP_ADDR=/etc/ha.d/resource.d/IPaddr2
++ FILE_SYSTEM=/etc/ha.d/resource.d/Filesystem
++ OCF_DIR=/usr/lib/ocf/resource.d/heartbeat
++ PATH=/bin:/sbin:/usr/bin:/usr/sbin:/opt/novell/afptcpd/bin/:/opt/novell/bin
+ exit_on_error nss /poolact=G013
+ nss /poolact=G013
Error 20892
Pool state was not changed successfully
+ rc=28
+ '[' '!' 28 -eq 0 ']'
+ exit 28
When loading a resource the resource goes comatose
When checking /var/opt/novell/logs/ncs/resource.load.out the following error is being displayed at the end:
Thu Nov 12 21:56:58 CET 2009
++ IP_ADDR=/etc/ha.d/resource.d/IPaddr2
++ FILE_SYSTEM=/etc/ha.d/resource.d/Filesystem
++ OCF_DIR=/usr/lib/ocf/resource.d/heartbeat
++ PATH=/bin:/sbin:/usr/bin:/usr/sbin:/opt/novell/afptcpd/bin/:/opt/novell/bin
+ exit_on_error nss /poolact=G013
+ nss /poolact=G013
Error 20892
Pool state was not changed successfully
+ rc=28
+ '[' '!' 28 -eq 0 ']'
+ exit 28
Resolution
This has been solved in the novell-ncpserv from
January 29, 2010 or later.
Additional Information
The problem is caused by the fact that ncpcon is not able to
unload the volume and in /var/opt/novell/log/ncs/resource.unload.out the
unload looks like:
CRM: Wed Feb 17 16:29:18 2010
++ IP_ADDR=/etc/ha.d/resource.d/IPaddr2
++ FILE_SYSTEM=/etc/ha.d/resource.d/Filesystem
++ OCF_DIR=/usr/lib/ocf/resource.d/heartbeat
++ PATH=/bin:/sbin:/usr/bin:/usr/sbin:/opt/novell/afptcpd/bin/:/opt/novell/bin
+ ignore_error cluster_afp.sh del BCC1-G013-V 10.11.51.163
+ cluster_afp.sh del BCC1-G013-V 10.11.51.163
+ return 0
+ ignore_error ncpcon unbind --ncpservername=BCC1-G013-V --ipaddress=10.11.51.163
+ ncpcon unbind --ncpservername=BCC1-G013-V --ipaddress=10.11.51.163
... Executing " unbind"
... completed OK [elapsed time = 2 Seconds 126 msecs 263 usecs]
+ return 0
+ ignore_error ncpcon dismount G013
+ ncpcon dismount G013
... Executing " dismount G013"
Due to this the NSS-pool is not deactivated and this will generate the NSS error 20892 when trying to activate the pool. The root cause of the problem is that NCPCON bind in defined in the load script is sometimes ignoring the provided device ID. This is visible when performing a ncpcon volume volumename
The ID displayed should be the same as in the load script and also under status you should see"cluster resource". When this is not correct, every unload of the resource will fail.
CRM: Wed Feb 17 16:29:18 2010
++ IP_ADDR=/etc/ha.d/resource.d/IPaddr2
++ FILE_SYSTEM=/etc/ha.d/resource.d/Filesystem
++ OCF_DIR=/usr/lib/ocf/resource.d/heartbeat
++ PATH=/bin:/sbin:/usr/bin:/usr/sbin:/opt/novell/afptcpd/bin/:/opt/novell/bin
+ ignore_error cluster_afp.sh del BCC1-G013-V 10.11.51.163
+ cluster_afp.sh del BCC1-G013-V 10.11.51.163
+ return 0
+ ignore_error ncpcon unbind --ncpservername=BCC1-G013-V --ipaddress=10.11.51.163
+ ncpcon unbind --ncpservername=BCC1-G013-V --ipaddress=10.11.51.163
... Executing " unbind"
... completed OK [elapsed time = 2 Seconds 126 msecs 263 usecs]
+ return 0
+ ignore_error ncpcon dismount G013
+ ncpcon dismount G013
... Executing " dismount G013"
Due to this the NSS-pool is not deactivated and this will generate the NSS error 20892 when trying to activate the pool. The root cause of the problem is that NCPCON bind in defined in the load script is sometimes ignoring the provided device ID. This is visible when performing a ncpcon volume volumename
The ID displayed should be the same as in the load script and also under status you should see"cluster resource". When this is not correct, every unload of the resource will fail.