Environment
Novell Cluster Services
Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server 2 (OES 2) Linux
Situation
Adding an additional node into an existing cluster fails.
The YaST installation procedure goes OK, up to 55%, then fails with error: "Failed to add node to cluster."
/var/log/YaST/y2log Shows these errors:
/var/opt/novell/install/ncslog Shows these failures:
An ndstrace on the LDAP traffic shows valid and no failure in the communication with eDirectory.
The YaST installation procedure goes OK, up to 55%, then fails with error: "Failed to add node to cluster."
/var/log/YaST/y2log Shows these errors:
2011-06-08 16:49:20 <3> CLFSN5(8367) [YCP] Ncs.ycp:689 NCS ncs_install.py -a failed:
2011-06-08 17:01:29 <3> CLFSN5(8367) [YCP] Report.ycp:484 NCS install failed to add node to cluster.
2011-06-08 17:01:29 <1> CLFSN5(8367) [YCP] Ncs.ycp:691 NCS:Sent up an error report:50
2011-06-08 17:01:29 <3> CLFSN5(8367) [YCP] Report.ycp:484 NCS install failed to add node to cluster.
2011-06-08 17:01:29 <1> CLFSN5(8367) [YCP] Ncs.ycp:691 NCS:Sent up an error report:50
/var/opt/novell/install/ncslog Shows these failures:
06/08/11 16:49:20: NCS adding node failed: 'nCSGIPCNodeNumber'
An ndstrace on the LDAP traffic shows valid and no failure in the communication with eDirectory.
Resolution
The error in /var/opt/novell/install/ncslog indicates that there is one or more extraneous, erroneous server objects in the cluster container.
These server objects either don't have the attribute "nCSGIPCNodeNumber" or the value for the attribute is wrong or duplicate.
Remove any non-existing, not clustered or removed NCP server objects in the cluster container. The only server objects in the Cluster container should be the current cluster nodes.
If that's not the case or it doesn't fix the problem, please check the attributes for all the servers objects in the cluster container, making sure their values are unique and valid (0 to 31).
The whole cluster needs to be restarted if the node numbers are changed.
These server objects either don't have the attribute "nCSGIPCNodeNumber" or the value for the attribute is wrong or duplicate.
Remove any non-existing, not clustered or removed NCP server objects in the cluster container. The only server objects in the Cluster container should be the current cluster nodes.
If that's not the case or it doesn't fix the problem, please check the attributes for all the servers objects in the cluster container, making sure their values are unique and valid (0 to 31).
The whole cluster needs to be restarted if the node numbers are changed.
Additional Information
To reduce the chance of data-loss when deleting the erroneous server object, it is recommended not go trough the "Cluster"> "Cluster Options" in iManager, rather trough "Directory
Administration"> "Delete Object".
When creating an NCP object for a POSIX filesystem cluster resource, make sure to create it in the container that contains the cluster container and NCP Server objects for the cluster nodes.
Re-Adding a node into the cluster may require some additional steps, captured in TID 3131978"How to completely reinstall a cluster node in OES/Linux", which is still valid for OES2.
When creating an NCP object for a POSIX filesystem cluster resource, make sure to create it in the container that contains the cluster container and NCP Server objects for the cluster nodes.
Re-Adding a node into the cluster may require some additional steps, captured in TID 3131978"How to completely reinstall a cluster node in OES/Linux", which is still valid for OES2.