Summary
Error
------
May 14 15:54:18 gn1s-a-1 heartbeat: [11063]: info: killing /usr/lib/heartbeat/crmd process group 11091 with signal 15
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: crm_shutdown: Requesting shutdown
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_shutdown_req: Sending shutdown request to DC: gn1s-a-1
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_shutdown_req: Processing shutdown locally
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: handle_shutdown_request: Creating shutdown request for gn1s-a-1
Cause
- while there could be multiple reasons for a failover to take place , in this particular case the reason for the failover was a system shutdown of gateway node gn1s-a-1 due to overheating.
bycast.log
----------
May 14 15:53:15 gn1s-a-1 hpasmxld[3917]: WARNING: System Overheating (Zone 4, Location Ambient, Temperature 41C)
May 14 15:53:15 gn1s-a-1 hpasmxld[3917]: A System Reboot has been requested by the management processor in 60 seconds.
:
May 14 15:54:15 gn1s-a-1 hpasmxld[3917]: A System Reboot has been initiated by the management processor.
May 14 15:54:15 gn1s-a-1 shutdown[7741]: shutting down for system reboot
Then the cluster starts shutting down.
ha.log
------
May 14 15:54:18 gn1s-a-1 heartbeat: [11063]: info: killing /usr/lib/heartbeat/crmd process group 11091 with signal 15
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: crm_shutdown: Requesting shutdown
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_shutdown_req: Sending shutdown request to DC: gn1s-a-1
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: do_shutdown_req: Processing shutdown locally
May 14 15:54:18 gn1s-a-1 crmd: [11091]: info: handle_shutdown_request: Creating shutdown request for gn1s-a-1
-- customer talked to the data center team and they did have a cooling outage on Sunday.
Fix
- there was no need for any workaround or fix , the fsg services failed over from gn1s-a-1 to gn1-a-1
- gn1-a-1 is now the active primary and gn1s-a-1 is the supplementary primary
- reset the failover count from the NMS