Environment
Novell Cluster Services
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Situation
Cluster resources will go comatose because the ncpcon bind command fails to execute.
The /var/opt/novell/log/ncs/resourcename.load.out file will contain an error similar to this:
+ ncpcon bind --ncpservername=RESOURCE --ipaddress=10.10.10.10
... Executing " bind"
... FAILED completion [elapsed time = 20 Seconds 142 msecs 359 usecs]
+ rc=1
If the log level for /var/opt/novell/log/ncpserv.log is set to debug you may see the following errors in the ncpserv.log:AdvertiseVirtualServer: AdvertiseThruSLP retry count=1
AdvertiseVirtualServer: AdvertiseThruSLP retry count=2
AdvertiseVirtualServer: AdvertiseThruSLP retry count=3
AdvertiseVirtualServer: AdvertiseThruSLP retry count=4
AdvertiseVirtualServer: AdvertiseThruSLP retry count=5
AdvertiseVirtualServer: AdvertiseThruSLP retry failed rc=-255
Resolution
This is fixed in OpenSLP version 1.2.0-22.36.4 or newer, available in the SLES patch channel.
Additional Information
There was a problem discovered with the way slpd was closing sockets. This would cause ndsd to re-use a socket that was no longer valid and the attempts to advertise the resource through SLP would fail. Since SLP is required for the ncpcon bind and unbind commands, when the advertisement through SLP fails, ncpcon will retry the bind statement 5 times and then timeout causing the resource to go comatose.