Environment
Novell Access Management 3 Linux Access Gateway
Novell Access Management 3 Netware Access Gateway
Novell Access Management 3 Linux Novell Identity Server
Novell Access Manager 3 Support Pack 1 Release Candidate 2 applied
Situation
An working access manager setup was upgraded to provide fault
tolerance. As part of this upgrade process, a Linux Access Gateway
(LAG) cluster was created, and a newly installed LAG was added to
this cluster. Each LAG, installed in the DMZ, had two interfaces -
one pointing to the public side where the browser requests would
hit the LAG, and one pointing to the private interface where the
Web server traffic occured.
After adding the second LAG to the cluster, the LAG would appear as a cluster member but the health check would have a yellow/warning symbol. Looking at the health check TAB, the following error message was reported:
After adding the second LAG to the cluster, the LAG would appear as a cluster member but the health check would have a yellow/warning symbol. Looking at the health check TAB, the following error message was reported:
There exists a configured cluster member that is not active! Expected cluster
members 172.16.200.11 172.16.200.12. Active cluster members: 167.236.200.5
172.16.200.11.
(Required Action) Start the member that is not active in the cluster.
An active cluster member is not in the configured list of cluster members!
Expected cluster members 172.16.200.11 172.16.200.12. Active cluster members:
167.236.200.5 172.16.200.11.
(Required Action) An ip address that is not known to the current configuration
is a member of the current cluster. Ensure that this ip address is a well known
address that should be included in the cluster.
Resolution
Fix routing issue between LAG devices. The problem is that the
cluster communication protocol ends up trying to make a socket on"one" of the nics but when it picks the "wrong" one, then the view
contains an ip address that is not in our "blessed" list and we
show the warning.
Additional Information
Here's details of the NIC setup and routes. Note the change in the routing table required to fix this.
Box 11
168.236.200.4 eth0
172.16.200.11 eth2
Box 12
168.236.200.5 eth0
172.16.200.12 eth2
Looking at the routing rules from Box 11 and Box 12 below for reference.
From Box 11, if a request is made to 172.16.200.12, the pattern "172.16.200.0"
will match and that will set the destination to the default Gateway which is
168.236.200.1 which will end up connecting to Box 12 on 168.236.200.5 --> This
one does not work
If the same thing is done from Box 12...
From Box 12, if a request is made to 172.16.200.11, the pattern "172.16.0.0"
will match and that will set the destination to 172.16.200.1 which will end up
connecting to Box 12 on 172.16.200.11 --> This one works
Changing the routing table entry from
172.16.44.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2
in Box 11's routing table to
172.16.0.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2
will make it work.
Routing table from 172.16.200.11 (Box 11)
Destination Gateway Genmask Flags MSS Window irtt Iface
168.236.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth0
172.16.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth2
172.16.44.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2
168.236.208.0 172.16.200.1 255.255.248.0 UG 0 0 0 eth2
168.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.20.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
10.0.0.0 172.16.200.1 255.0.0.0 UG 0 0 0 eth2
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 168.236.200.1 0.0.0.0 UG 0 0 0 eth0
Routing table from 172.16.200.12 (Box 12)
Destination Gateway Genmask Flags MSS Window irtt Iface
168.236.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth0
172.16.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth2
172.16.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
168.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.20.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
10.0.0.0 172.16.200.1 255.0.0.0 UG 0 0 0 eth2
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 168.236.200.1 0.0.0.0 UG 0 0 0 eth0