Environment
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 1
Novell Open Enterprise Server (NetWare 6.5) Support Pack 8
Novell Cluster Services 1.8.4
Novell Business Continuance Cluster (BCC)
Novell Open Enterprise Server (NetWare 6.5) Support Pack 8
Novell Cluster Services 1.8.4
Novell Business Continuance Cluster (BCC)
Situation
The environment consists of two geographically dispersed multi-node Novell NetWare 6.5 clusters, that are configured in a Business Continuance Cluster (also called a BCC cluster) setup. The BCC software allows for cluster to cluster fail-over possibilities between the two separated locations.
In the BCC environment our cluster nodes are configured using the OSPF protocol, and use Virtual NIC's in the cluster resource Load/Unload scripts. Note that migrating BCC enabled cluster resources between a NetWare and OES Linux clusters is not supported, but migrating such resources within the same cluster is !
The problem encountered was that (BCC enabled) NCS resources, which were migrated within the same cluster, always ended up in a comatose state when being activated on OES Linux nodes.
In the BCC environment our cluster nodes are configured using the OSPF protocol, and use Virtual NIC's in the cluster resource Load/Unload scripts. Note that migrating BCC enabled cluster resources between a NetWare and OES Linux clusters is not supported, but migrating such resources within the same cluster is !
The problem encountered was that (BCC enabled) NCS resources, which were migrated within the same cluster, always ended up in a comatose state when being activated on OES Linux nodes.
Resolution
The problem is with the '/opt/novell/ncs/bin/clstrlib.py' script that does not translate the required strings properly. A modified version of this file provides more flexibility for 'in-memory' translation of the script and has resolved this problem. The fix is scheduled to be released in a future update.
Additional Information
When migrating resources from a NetWare NCS cluster node, to a OES Linux NCS cluster node, the used resource Load or Unload script needs to be translated 'in-memory' as executing scripts that is valid for the NetWare platform, will not be recognized on the OES Linux platform
For example the following resource load script on NetWare :
nss /poolactivate=TEST
mount TEST VOLID=201
CLUSTER CVSBIND ADD BCC1-TEST-V 10.11.51.201
NUDP ADD BCC1-TEST-V 10.11.51.201
###add secondary ipaddress 10.11.51.201
ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
bind IP VNIC mask=255.255.255.255 address=10.11.51.201
# FTP Server laden
nwftpd -c TEST:\etc\ftpserv.cfg
will be translated on OES Linux as follows:
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
exit_on_error nss /poolact=TEST
exit_on_error ncpcon mount TEST=201
CIFS_IP=10.11.51.201
### exit_on_error add_secondary_ipaddress 10.11.51.201
exit_on_error ncpcon bind --ncpservername=BCC1-TEST-V--ipaddress=10.11.51.201
ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
bind IP VNIC mask=255.255.255.255 address=10.11.51.201
# FTP Server laden
nwftpd -c TEST:\etc\ftpserv.cfg
exit 0
The problem here is the bind command which has a different meaning on Linux and therefor the resource ends up in a comatose state
The same problem applies for the cluster resource unload script. The unload script is using the unbind command which is not a valid Linux command.
The workaround for the issue is by replacing the bind command with the following 2 commands it works:
ignore_error bind IP VNIC mask=255.255.255.255 address=10.11.51.120
ignore_error exit_on_error ip addr add 10.11.51.120/32 dev VNIC
For example the following resource load script on NetWare :
nss /poolactivate=TEST
mount TEST VOLID=201
CLUSTER CVSBIND ADD BCC1-TEST-V 10.11.51.201
NUDP ADD BCC1-TEST-V 10.11.51.201
###add secondary ipaddress 10.11.51.201
ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
bind IP VNIC mask=255.255.255.255 address=10.11.51.201
# FTP Server laden
nwftpd -c TEST:\etc\ftpserv.cfg
will be translated on OES Linux as follows:
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
exit_on_error nss /poolact=TEST
exit_on_error ncpcon mount TEST=201
CIFS_IP=10.11.51.201
### exit_on_error add_secondary_ipaddress 10.11.51.201
exit_on_error ncpcon bind --ncpservername=BCC1-TEST-V--ipaddress=10.11.51.201
ospfconf addr=10.11.51.201 intf=VNIC area=0.0.0.77
bind IP VNIC mask=255.255.255.255 address=10.11.51.201
# FTP Server laden
nwftpd -c TEST:\etc\ftpserv.cfg
exit 0
The problem here is the bind command which has a different meaning on Linux and therefor the resource ends up in a comatose state
The same problem applies for the cluster resource unload script. The unload script is using the unbind command which is not a valid Linux command.
The workaround for the issue is by replacing the bind command with the following 2 commands it works:
ignore_error bind IP VNIC mask=255.255.255.255 address=10.11.51.120
ignore_error exit_on_error ip addr add 10.11.51.120/32 dev VNIC