Environment
Situation
- Go to link https://www.novell.nl/mwp2/faces/confidential/aanmelden.jspx
- Login with user: piet and password
- client redirect: https://logon.novell.nl/nidp/rsg/rsglogin.jsp
- client redirect: https://sso.novell.nl/LAGBroker?c=MC/secure/name/password/uri&%22https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- client redirect: https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- Dashboard, user logged in: https://www.novell.nl/mwp2/faces/secure/dashboard.jspx?sc=0
- : ) everything works fine, no problems
- Go to link https://www.novell.nl/mwp2/faces/confidential/aanmelden.jspx
- Login with valid username and password
- client redirect: https://logon.novell.nl/nidp/rsg/rsglogin.jsp
- client redirect: https://sso.novell.nl/LAGBroker?c=MC/secure/name/password/uri&%22https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- client redirect: https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- client redirect: https://sso.novell.nl/LAGBroker?c=MC/secure/name/password/uri&%22https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- client redirect: https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- client redirect: https://sso.novell.nl/LAGBroker?c=MC/secure/name/password/uri&%22https://www.novell.nl/mwp2/faces/secure/gotoDashboard
- etc.
- User experiences looping without ever accessing the application
Resolution
<param-name>JGroupsConfiguration</param-name>
<param-value>TCP(start_port=[nidp:ClusterPort];end_port=[nidp:ClusterPort][nidp:IfExternalAddress];external_addr=[nidp:ExternalAddress][nidp:EndIf]):TCPPING(initial_hosts=[nidp:ClusterMembers];port_range=1;timeout=3500;num_initial_members=2;up_thread=true;down_thread=true):MERGE2(min_interval=10000;max_interval=30000):FD_SOCK([nidp:IfExternalAddress]bind_addr=[nidp:ExternalAddress][nidp:EndIf]):FD(shun=true;timeout=5000;max_tries=5;up_thread=true;down_thread=true):VERIFY_SUSPECT(timeout=2000;down_thread=false;up_thread=false):pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):pbcast.STATE_TRANSFER():pbcast.GMS(merge_timeout=10000;join_timeout=5000;join_retry_timeout=2000;shun=true;print_local_addr=[nidp:DebugOn];down_thread=true;up_thread=true)</param-value>
</context-param>
Cause
- sending are-you-alive msg
to 172.26.0.199:7801 (own address=172.26.0.45:7801)
- sending are-you-alive
msg to 172.26.0.199:7801 (own address=172.26.0.45:7801)
- sending
are-you-alive msg to 172.26.0.199:7801 (own address=172.26.0.45:7801)
-
heartbeat missing from 172.26.0.199:7801 (number=1)
- heartbeat missing from
172.26.0.199:7801 (number=1)
- heartbeat missing from 172.26.0.199:7801
(number=1)
<amLogEntry> 2014-02-03T12:49:11Z DEBUG NIDS
Application:
Method: DMessageBus.A
Thread:
ajp-bio-127.0.0.1-9009-exec-21
DMessageBus Message Response: Elapsed Millis:
15002, Count: 1
Response #0: from member 172.26.0.199.
Was
Received: false
Was Suspected: false
</amLogEntry>
With jgroups debug logging enabled on this server, we show that we send keep alives but never get responses:
1848014 [TimeScheduler.Thread]
DEBUG org.jgroups.protocols.FD - sending are-you-alive msg to 172.26.0.199:7801
(own address=172.26.0.45:7801)
1848014 [TimeScheduler.Thread] DEBUG
org.jgroups.protocols.FD - heartbeat missing from 172.26.0.199:7801
(number=0)
1848014 [TimeScheduler.Thread] DEBUG org.jgroups.protocols.FD -
heartbeat missing from 172.26.0.199:7801 (number=0)
1848014
[TimeScheduler.Thread] DEBUG org.jgroups.protocols.FD - heartbeat missing from
172.26.0.199:7801 (number=0)
Looking at both AG1 and AG7, we see that we do not merge successfully ... one of the jgrpups was not reset, but the other does NOT show the initialisation.
AG 1 shows following merge operation:
1577271 [MERGE2.FindSubgroups
thread
(channel=cn=SCC13BA39BD9D7F9B8D,cn=cluster,cn=nids,ou=accessManagerContainer,o=novellNIDPMessageBus)]
DEBUG org.jgroups.protocols.MERGE2 - initial_mbrs=[[own_addr=172.26.0.45:7801,
coord_addr=172.26.0.199:7801, is_server=true], [own_addr=172.26.0.199:7801,
coord_addr=172.26.0.199:7801, is_server=true]]
1578806 [TimeScheduler.Thread]
DEBUG org.jgroups.protocols.FD - sending are-you-alive msg to 172.26.0.45:7801
(own address=172.26.0.199:7801)
AG7 has no merge operations at all ...