Environment
Novell Cluster Services 1.8.4
Novell Cluster Services 1.8.3
Novell Cluster Services 1.8.3
Situation
The /var/log/messages files for all cluster nodes are flooded with the following messages;
cl_status: [29975]: ERROR: REASON: hb_api_signon: Can't initiate connection to heartbeat
cl_status: [851]: ERROR: Cannot signon with heartbeat
cl_status: [29975]: ERROR: REASON: hb_api_signon: Can't initiate connection to heartbeat
cl_status: [851]: ERROR: Cannot signon with heartbeat
Resolution
In this case the customer implemented Nagios and the agent script "/usr/bin/check_mk_agent" was the reason for the messages being seen. In the script the command "cl_status" is executed, remark the relevant heartbeat entries from the script or disable this specific script from running when it is a Novell Cluster.
Cause
As part of the Novell Cluster Service install the Heartbeat package is also installed, however, Novell Cluster Services do not use or require "cl_status" specifically. Since Heartbeat is not actually running on a Novell Cluster Services node, when attempting to execute "cl_status listnodes" the error is seen.
Additional Information
The error can easily be duplicated by issuing the following command manually on any Novell Cluster Services node;
cl_status listnodes
The following is the relevant part in the "/usr/bin/check_mk_agent" Nagios agent script;
# Heartbeat monitoring
if which cl_status > /dev/null 2>&1; then
# Different handling for heartbeat clusters with and without CRM
# for the resource state
if [ -S /var/run/heartbeat/crm/cib_ro ]; then
echo '<<<heartbeat_crm>>>'
crm_mon -1 -r | grep -v ^$ | sed 's/^\s/_/g'
else
echo '<<<heartbeat_rscstatus>>>'
cl_status rscstatus
fi
echo '<<<heartbeat_nodes>>>'
for NODE in $(cl_status listnodes); do
if [ $NODE != $(echo $HOSTNAME | tr 'A-Z' 'a-z') ]; then
STATUS=$(cl_status nodestatus $NODE)
echo -n "$NODE $STATUS"
for LINK in $(cl_status listhblinks $NODE 2>/dev/null); do
echo -n " $LINK $(cl_status hblinkstatus $NODE $LINK)"
done
echo
fi
done
fi
cl_status listnodes
The following is the relevant part in the "/usr/bin/check_mk_agent" Nagios agent script;
# Heartbeat monitoring
if which cl_status > /dev/null 2>&1; then
# Different handling for heartbeat clusters with and without CRM
# for the resource state
if [ -S /var/run/heartbeat/crm/cib_ro ]; then
echo '<<<heartbeat_crm>>>'
crm_mon -1 -r | grep -v ^$ | sed 's/^\s/_/g'
else
echo '<<<heartbeat_rscstatus>>>'
cl_status rscstatus
fi
echo '<<<heartbeat_nodes>>>'
for NODE in $(cl_status listnodes); do
if [ $NODE != $(echo $HOSTNAME | tr 'A-Z' 'a-z') ]; then
STATUS=$(cl_status nodestatus $NODE)
echo -n "$NODE $STATUS"
for LINK in $(cl_status listhblinks $NODE 2>/dev/null); do
echo -n " $LINK $(cl_status hblinkstatus $NODE $LINK)"
done
echo
fi
done
fi