Cluster resources go comatose on Linux nodes in 'mixed' NCS clusters

  • 7003384
  • 28-May-2009
  • 27-Apr-2012

Environment

Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 1
Novell Open Enterprise Server (NetWare 6.5) Support Pack 8
Novell Cluster Services 1.8.4
Novell Business Continuance Cluster (BCC)

Situation

The environment consists of two geographically dispersed multi-node Novell NetWare 6.5 clusters, that are configured in a Business Continuance Cluster (also called a BCC cluster) setup. The BCC software allows for cluster to cluster fail-over possibilities between the two separated locations.

In the BCC environment our cluster nodes are configured using the OSPF protocol, and use Virtual NIC's in the cluster resource Load/Unload scripts. Note that migrating BCC enabled cluster resources between a NetWare and OES Linux clusters is not supported, but migrating such resources within the same cluster is !

The problem encountered was that (BCC enabled) NCS resources, which were migrated within the same cluster, always ended up in a comatose state when being activated on OES Linux nodes.

Resolution

The problem is with the '/opt/novell/ncs/bin/clstrlib.py' script that does not translate the required strings properly. A modified version of this file provides more flexibility for 'in-memory' translation of the script and has resolved this problem. The fix is scheduled to be released in a future update.


Additional Information

When migrating resources from a NetWare NCS cluster node, to a OES Linux NCS cluster node, the used resource Load or Unload script needs to be translated 'in-memory' as executing scripts that is valid for the NetWare platform, will not be recognized on the OES Linux platform

For example the following resource load script on NetWare :
    nss /poolactivate=TEST
    mount TEST VOLID=201
    CLUSTER CVSBIND ADD BCC1-TEST-V 10.11.51.201
    NUDP ADD BCC1-TEST-V 10.11.51.201
    ###add secondary ipaddress 10.11.51.201
    ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
    bind IP VNIC mask=255.255.255.255 address=10.11.51.201
    # FTP Server laden
    nwftpd -c TEST:\etc\ftpserv.cfg

will be translated on OES Linux as follows:
    #!/bin/bash
    . /opt/novell/ncs/lib/ncsfuncs
    exit_on_error nss /poolact=TEST
    exit_on_error ncpcon mount TEST=201
    CIFS_IP=10.11.51.201
    ### exit_on_error add_secondary_ipaddress 10.11.51.201
    exit_on_error ncpcon bind --ncpservername=BCC1-TEST-V--ipaddress=10.11.51.201
    ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
    bind IP VNIC mask=255.255.255.255 address=10.11.51.201
    # FTP Server laden
    nwftpd -c TEST:\etc\ftpserv.cfg
    exit 0

The problem here is the bind command which has a different meaning on Linux and therefor the resource ends up in a comatose state
The same problem applies for the cluster resource unload script. The unload script is using the unbind command which is not a valid Linux command.

The workaround for the issue is by replacing the bind command with the following 2 commands it works:
    ignore_error bind IP VNIC mask=255.255.255.255 address=10.11.51.120
    ignore_error exit_on_error ip addr add 10.11.51.120/32 dev VNIC