Why does migrating cluster volumes take longer on OES linux

  • 7006318
  • 21-Jun-2010
  • 27-Apr-2012

Environment


Novell Open Enterprise Server 1 (OES 1) Linux
Novell Open Enterprise Server 2 (OES 2) Linux

Situation

Is it normal for migration of cluster data resources to take 20 seconds between Linux nodes?

It was observed that in the same cluster, it took just a few seconds to migrate a clustered pool/volume from a NetWare node to another NetWare node.  However, once that resource was moved to a Linux node, the migration to another Linux node took approximately 20 seconds - the majority of this time was during the load script.

Resolution

The short answer is "most likely".  To understand this further, a little bit of background is required.  There are various steps that occur when migrating a clustered pool/volume from one node to another, including:
  1. IP address is released
  2. volume dismounts
  3. pool is deactivated
  4. resource is identified as no longer in use
  5. identify next node to host
  6. next host insures resource is not in use & mounts*
  7. pool is activated
  8. volume mounts
  9. IP address is bound

The first 5 steps occur relatively quickly on both the Linux and NetWare kernels.  The clustering & OS modules play a big role in the 6th steps – basically verifying that the disk resource is no longer in use and thereafter mounting the resource.

On the NetWare kernel, there exists a tight relationship & communications between Media Manager, Clustering & NSS. Therefore the migration between NetWare nodes occurs more quickly as all nodes get to singularity more quickly due to the tight relationship of these services, and the fact they run in processor ring 0.

On the Linux kernel, some of the players have changed.  EVMS replaces media manager.  As EVMS is openSource (viz. non-Novell owned but Novell contributes to it's development), the relationship & communications between EVMS and Clustering/NSS is not as tight. Therefore, extra care is required to insure we don't have the same resource mounted on two (or more) nodes.  Hence,  this is why it take longer to migrate resources between linux nodes.



Additional Information

Above, 18-20 seconds was the duration we found with our testing environment which included
  • 8 node cluster,
  • Fibre attached SAN &
  • roughly 30 clustered nodes
Depending on your hardware configuration, you may see a shorter or longer delay (but roughly in the same ballpark).

The Novell (ncp) Clients should work fine with the additional time to recover.  Having said this, however, it is recommended that you test your typical client configuration to confirm what your end users will experience if a clustered volume is migrated during various operations: file read, file write, directory navigation, etc.  If errors are obtained, you may be able to modify the client configuration to address these errors.