Cluster node join attempt hangs on cluster with a lot of LUNs

  • 3314872
  • 14-Jan-2008
  • 26-Apr-2012

Environment


Novell NetWare 6.0 Support Pack 5
Novell NetWare 6.0 Support Pack 4
Novell NetWare 6.5 Support Pack 2
Novell NetWare 6.5 Support Pack 3
Novell NetWare 6.5 Support Pack 4
Novell NetWare 6.5 Support Pack 5

Situation

The cluster join will sometimes hang the system console. The server will experience high utilization during this period. A reboot of the node may or may not successfully allow the node to join. Disk IOs appear to be stuck.

The console command process is stuck in a tight loop waiting for the CDMActivateCount to go to null. When attempting to join a cluster, we will first do a "scan for new devices" in an effort to find the SBD or Cluster partition. If there are a lot of LUNs the CDM_Activate_Count may get off causing the hang.

Resolution

The customer can decrease the likelihood of this issue occurring by decreasing the number of IO requests that are made during a "scan for new devices". In the case of a GLogic driver, you can use the "maxluns" setting when loading the driver to decrease the number of LUNs that must be scanned we also removed the "inquiry" parameter in an effort to decrease the number of IO requests being made to the SAN. The long range solution is to upgrade the cluster to NetWare 6.5 SP6 or higher as the bug has been fixed in NWPA.NLM version 3.21 dated September 26, 2006 which was included in NetWare 6.5 SP6.