NCS fails to start with error: "Failed to read node number from NDS"

  • 7024005
  • 12-Jul-2019
  • 05-Aug-2019

Environment

Open Enterprise Server 2018 (OES 2018) Linux Support Pack 1
Open Enterprise Server 2018 (OES 2018) Linux

Situation

NCS fails to start on a new node.
Error:  "CLUSTER-<FATAL>-<2006>: Failed to read node number from NDS"
Error:  "modprobe: ERROR: could not insert 'vll': Operation not permitted"

Resolution

Check the value of the "NCS:Node Isolation Script" attribute on all cluster node objects and make sure it is set to "panic".  After changing any of them to "panic", run "/opt/novell/ncs/bin/ncs-configd.py -init" on all cluster nodes, including the problem node.
 
If NCS still fails to start after making the change to the attribute, it might be necessary to remove it from the cluster and reinstall it into the cluster.  It should then load correctly.

Cause

The creation of /var/opt/novell/ncs/nodes.xml is not fully completed due to the incorrect value in the "NCS:Node Isolation Script" attribute.

Additional Information

Complete log messages:
 
2018-09-04T14:33:36.225960+05:30 blr7-169-24 kernel: [10598.400650] CLUSTER-<FATAL>-<2006>: Failed to read node number from NDS
2018-09-04T14:33:36.263269+05:30 blr7-169-24 ldncs[20548]: modprobe: ERROR: could not insert 'vll': Operation not permitted
2018-09-04T14:33:36.264080+05:30 blr7-169-24 systemd[1]: novell-ncs.service: Control process exited, code=exited status=4
2018-09-04T14:33:36.264421+05:30 blr7-169-24 systemd[1]: Failed to start Novell Cluster Services(NCS).
2018-09-04T14:33:36.264720+05:30 blr7-169-24 systemd[1]: novell-ncs.service: Unit entered failed state.
2018-09-04T14:33:36.265012+05:30 blr7-169-24 systemd[1]: novell-ncs.service: Failed with result 'exit-code'.