"There exists a configured cluster member that is not active!" adding new Access Gateway to cluster group

  • 3122390
  • 14-Sep-2007
  • 26-Apr-2012

Environment


Novell Access Management 3 Linux Access Gateway
Novell Access Management 3 Netware Access Gateway
Novell Access Management 3 Linux Novell Identity Server
Novell Access Manager 3 Support Pack 1 Release Candidate 2 applied

Situation

An working access manager setup was upgraded to provide fault tolerance. As part of this upgrade process, a Linux Access Gateway (LAG) cluster was created, and a newly installed LAG was added to this cluster. Each LAG, installed in the DMZ, had two interfaces - one pointing to the public side where the browser requests would hit the LAG, and one pointing to the private interface where the Web server traffic occured.

After adding the second LAG to the cluster, the LAG would appear as a cluster member but the health check would have a yellow/warning symbol. Looking at the health check TAB, the following error message was reported:

There exists a configured cluster member that is not active! Expected cluster
members 172.16.200.11 172.16.200.12. Active cluster members: 167.236.200.5
172.16.200.11.

(Required Action) Start the member that is not active in the cluster.
An active cluster member is not in the configured list of cluster members!
Expected cluster members 172.16.200.11 172.16.200.12. Active cluster members:
167.236.200.5 172.16.200.11.

(Required Action) An ip address that is not known to the current configuration
is a member of the current cluster. Ensure that this ip address is a well known
address that should be included in the cluster.

Resolution

Fix routing issue between LAG devices. The problem is that the cluster communication protocol ends up trying to make a socket on"one" of the nics but when it picks the "wrong" one, then the view contains an ip address that is not in our "blessed" list and we show the warning.

Additional Information

Here's details of the NIC setup and routes. Note the change in the routing table required to fix this.

Box 11
168.236.200.4 eth0
172.16.200.11 eth2

Box 12
168.236.200.5 eth0
172.16.200.12 eth2

Looking at the routing rules from Box 11 and Box 12 below for reference.

From Box 11, if a request is made to 172.16.200.12, the pattern "172.16.200.0"
will match and that will set the destination to the default Gateway which is
168.236.200.1 which will end up connecting to Box 12 on 168.236.200.5 --> This
one does not work

If the same thing is done from Box 12...

From Box 12, if a request is made to 172.16.200.11, the pattern "172.16.0.0"
will match and that will set the destination to 172.16.200.1 which will end up
connecting to Box 12 on 172.16.200.11 --> This one works


Changing the routing table entry from

172.16.44.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2

in Box 11's routing table to

172.16.0.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2

will make it work.

Routing table from 172.16.200.11 (Box 11)

Destination Gateway Genmask Flags MSS Window irtt Iface
168.236.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth0
172.16.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth2
172.16.44.0 172.16.200.1 255.255.255.0 UG 0 0 0 eth2
168.236.208.0 172.16.200.1 255.255.248.0 UG 0 0 0 eth2
168.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.20.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
10.0.0.0 172.16.200.1 255.0.0.0 UG 0 0 0 eth2
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 168.236.200.1 0.0.0.0 UG 0 0 0 eth0

Routing table from 172.16.200.12 (Box 12)

Destination Gateway Genmask Flags MSS Window irtt Iface
168.236.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth0
172.16.200.0 0.0.0.0 255.255.255.224 U 0 0 0 eth2
172.16.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
168.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.20.0.0 172.16.200.1 255.255.0.0 UG 0 0 0 eth2
10.0.0.0 172.16.200.1 255.0.0.0 UG 0 0 0 eth2
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 168.236.200.1 0.0.0.0 UG 0 0 0 eth0