Environment
Novell SUSE Linux Enterprise Server 10
Novell SUSE Linux Enterprise Server 9
Novell SUSE Linux Enterprise Server 9
Situation
Creating a High Availability (HA) cluster and specifying a fully
qualified domain name for additional nodes, leads to duplicate
nodes in the (Cluster Information Base) CIB database. HA is
expecting uname -n
node names, not fully qualified names. If you run the command cl_status listnodes, you
will see the following:
han1
han1.provo.novell.com
han2.provo.novell.com
You will also see the duplicate nodes in hb_gui. In addition, the ha.cf file must be the same on all nodes, with the exception of the bcast parameter. If the node list is different, this will lead to even more extra nodes. If you have nodes han1 and han1.provo.novell.com, then heartbeat will start one of the nodes, but the other will never start. This leads to a no quorum in the cluster. By default, you can only add resources to the cluster if you have quorum (two or more running nodes for example). So, this duplicate node condition prevents you from adding resources to the cluster.
han1
han1.provo.novell.com
han2.provo.novell.com
You will also see the duplicate nodes in hb_gui. In addition, the ha.cf file must be the same on all nodes, with the exception of the bcast parameter. If the node list is different, this will lead to even more extra nodes. If you have nodes han1 and han1.provo.novell.com, then heartbeat will start one of the nodes, but the other will never start. This leads to a no quorum in the cluster. By default, you can only add resources to the cluster if you have quorum (two or more running nodes for example). So, this duplicate node condition prevents you from adding resources to the cluster.
Resolution
The ha.cf files need to be corrected, and the extra nodes deleted.
This can be done with the following procedure:
On ALL nodes
1. Stop heartbeat (rcheartbeat stop)
2. Delete the host cache (rm /var/lib/heartbeat/hostcache)
3. Delete the deleted host cache (rm /var/lib/heartbeat/delhostcache)
On ONE node:
1. Change /etc/ha.d/ha.cf to reference each node only once, without a fully qualified domain name; and do not use autojoin any. Make sure you include all nodes, including the current node on which you are editing the ha.cf file. For example,
Correct:
autojoin none
node han1
node han2
Incorrect:
autojoin any
node han1
node han1.provo.novell.com
node han2.provo.novell.com
2. Run /usr/lib/heartbeat/ha_propagate or /usr/lib64/heartbeat/ha_propagate
On ALL nodes:
1. Restart heartbeat (rcheartbeat start)
2. Check for duplicate nodes (cl_status listnodes)
On ALL nodes
1. Stop heartbeat (rcheartbeat stop)
2. Delete the host cache (rm /var/lib/heartbeat/hostcache)
3. Delete the deleted host cache (rm /var/lib/heartbeat/delhostcache)
On ONE node:
1. Change /etc/ha.d/ha.cf to reference each node only once, without a fully qualified domain name; and do not use autojoin any. Make sure you include all nodes, including the current node on which you are editing the ha.cf file. For example,
Correct:
autojoin none
node han1
node han2
Incorrect:
autojoin any
node han1
node han1.provo.novell.com
node han2.provo.novell.com
2. Run /usr/lib/heartbeat/ha_propagate or /usr/lib64/heartbeat/ha_propagate
On ALL nodes:
1. Restart heartbeat (rcheartbeat start)
2. Check for duplicate nodes (cl_status listnodes)
Additional Information
Use node names as shown with uname -n to avoid this issue. Do not
use fully qualified node names.