On a two node cluster on Cluster Services 1.7, an abend on one Node will put all resource in comatose state in the second Node.

  • 3005414
  • 09-Jan-2007
  • 27-Apr-2012

Environment

NetWare 6.5
NetWare Cluster Services 1.7

Situation

On a two node cluster on Cluster Services 1.7, an abend on one Node will put all resource in comatose state in the second Node.

Resolution

This is behaving as designed. It is part of the cascade failure prevention.
clstrlib /hmo=off will disable this behavior and fix this behavior.

Additional Information

Preventing Cascading Failovers
Cascading failover occurs when a bad cluster resource causes a server to fail, then fails over to another server causing it to fail, and then continues failing over to and bringing down additional cluster servers until possibly all servers in the cluster have failed.

Novell Cluster Services now incorporates functionality that detects if a node has failed because of a bad cluster resource and prevents that bad resource from failing over to other servers in the cluster.

This functionality is enabled by default when you install Novell Cluster Services. Cascading failover prevention can be disabled by adding the /hmo=off parameter to the clstrlib command in the sys:\system\ldncs.ncf file.

After adding the parameter, the line should appear as follows:

clstrlib /hmo=off

If you disable cascading failover prevention on one cluster server, you must do it on all servers in the cluster.

You must manually unload and reload Novell Cluster Services software on every cluster server in order for this change to take effect. To do this, use the uldncs command to unload cluster software and the ldncs command to reload cluster software.

NOTE: Cascading failover prevention is not supported on OES Linux, so this option is irrelevant. The default OES Linux functionality is cascading failover.

Formerly known as TID# 10092217