Linux Access Gateway fails to come up after a restart

  • 7008588
  • 16-May-2011
  • 26-Apr-2012

Environment

Novell Access Manager 3.1 Linux Access Gateway (LAG)
Novell Access Manager 3.1 SUpport Pack 3 applied
Primary and Secondary Administration Console's setup
LAG cluster included 10 LAGs
Identity (IDP) Server cluster included 4 IDP servers

Situation

Linux Access Gateway (LAG) cluster setup with 10 nodes in the cluster. All web applications accelerated by the LAGs working fine ie. users could authenticate to the Identity Server when accessing the protected resources, and the web applications would display the relevant data.

For maintenance purposes, one of the LAGs was rebooted. After it came back up again, the LAG would register as down in the Administration Console healthcheck. Looking at the ics_dyn log file, it reported that the service provider/tomcat failed to start. Following this advice, we looked at the catalina.out file and saw the following

INFO: Starting Coyote HTTP/1.1 on http-172.20.135.9-8443
May 15, 2011 3:38:21 AM org.apache.coyote.http11.Http11BaseProtocol start
INFO: Starting Coyote HTTP/1.1 on http-8445
May 15, 2011 3:38:21 AM org.apache.coyote.http11.Http11BaseProtocol start
INFO: Starting Coyote HTTP/1.1 on http-8446
May 15, 2011 3:38:21 AM org.apache.catalina.storeconfig.StoreLoader load
INFO: Find registry server-registry.xml at classpath resource
May 15, 2011 3:38:22 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 35259 ms
May 15, 2011 6:12:32 AM org.apache.coyote.http11.Http11BaseProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-172.20.135.9-8080

Typically, after the 'Server startup' message, one would expect to see some of the configuration information being logged to the screen eg.

NIDPMeEntity.commonInitialize(): Complete! Config Name: lagcluster129
NIDPMeEntity.loadProtocol(): Loaded protocol: com.novell.nidp.liberty.LibertyMeDescriptor:liberty12

Since there was nothing in the above case, the assumption was that the configuration from the config store (Admin Console) was not read successfully.



Resolution

Restart the Admin Console.

During the service provider restart on the LAG, the ESP configuration and protected resource policy information needs to be retrieved from the Admin Console. The fact that all LAGs were reporting the same issue was an indication that the problem was not specific to the LAGs, but possibly to the configuration store that the LAGs needed to communicate with. A dsrepair may have helped but the restart of he Admin Console server restarted the config store automatically.