When a mirrored SBD parttition exists, SBDUTIL -V on the Master shows toggling values for all other nodes

  • 7001848
  • 11-Nov-2008
  • 27-Apr-2012

Environment

Novell Cluster Services 1.8.4
Novell Open Enterprise Server (Linux based)

Situation

When in a Novell cluster environment a mirrored SBD partition is being used a "watch sbdutil -v" on the cluster Master node 
will show toggling values for all the other cluster nodes.

Currently all secondary nodes are only writing to their own slot on only one of the SBD partitions, not to the
mirrored SBD partition.

The problem with an SBD partition that is not properly mirroring is that when the access to the current SBD partition is lost, all the
cluster nodes will now have to read their information from the mirrored SBD partition.

Since they will then not see the increased values they expect to see they will be cast out of the cluster.

Resolution

The fix has been coded for both OES1 and OES2 and has been released to the appropriate update channels for the respective distribution:
  • For OES1 the fix has been released in patch-12247 which is currently available from the OES patch channel.
Note: You must have applied patch-12266 which would give you the latest kernel modules for clustering, adminfs and zapi-shim, but that this does not contain the fix for this issue.
  • For OES2 the fix has been released in maintenance patch oes2-novell-nss-5503-0 which is available from the OES2 catalog.

Additional Information

Root cause analysis of the problem has revealed that:
  • Only the cluster node having the Master IP Address is updating the slots on both SBD partitions
  • All cluster slave nodes only update their own slot on the SBD partition, but not the other
  • Running "sbdutil -v" on the slave nodes will only read from one SBD partition
  • Running "sbdutil -v" on the master will 'load-balance' the SBD reads and for this reason we will see the 'toggling values' as every second read will go to the second SBD partition
It was found that a problem in the nwraid1 driver caused this specific behavior.


Note: When you still see such toggling values on the SBD partition, after you have applied the above mentioned patches, please verify that the node is actually updating its information to the SBD partition properly.

Under normal circumstances, the problem will correct itself after the patch is installed and the server rebooted. Each node will then write to its respective sector on disk, and functionality is restored.

In the case where there is no longer a node at the offending offset, it can no longer write to the SBD partition and than it is best to recreate the SBD partition.