Linux Access Gateway restarting after upgrading to Access Manager 3.1 SP1 IR3 (3.1.1-265)

  • 7005725
  • 13-Apr-2010
  • 26-Apr-2012

Environment

Novell Access Manager 3.1 Linux Access Gateway
Novell Access Manager 3.1 Support Pack 1 Interim Release 3 applied

Situation

Linux Access Gateway upgraded (LAG) from 3.1 SP1 IR2 to IR3a. Upgrade appeared to be successful, but when the LAG was rebooted, or /etc/init.d/novell-vmc services were stopped and started, the proxy would automatically restart once before initialising successfully.
 
The ics_dyn.log file would include the following restart statement indicating a delay was detected
 
Apr  7 18:33:48 savnamag1 vmcontroller: AM#504514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM COntroller timer sleep done
Apr  7 18:33:48 savnamag1 vmcontroller: AM#504514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM COntroller timer done
Apr  7 18:33:48 savnamag1 :  AM#104500000: AMDEVICEID#ag-: AMAUTHID#0:
AMEVENTID#0: Delay detected.. RESTARTING ICS_DYN : 10653
Apr  7 18:33:54 savnamag1 vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM-0 DOWN, being RESTARTED (Wed Apr  7 18:33:54

Resolution

Run the following commands on the LAG console:
 
> chkconfig --del lagmonitor
> cp /chroot/lag/opt/novell/bin/postupgrade.sh /tmp/.
> sed -i -e "/^[[:space:]]*\/etc\/init.d\/lagmonitor [[:space:]]*st/d" /chroot/lag/opt/novell/bin/postupgrade.sh
 
A new healthcheck was enabled to detect possible LAG hangs in IR3, and the default delay timeout was exceeded during the restart of the system above. This healthcheck should only be enabled on systems running into the hang problems and not on all systems by default. This healthcheck will be removed in future patches.

Additional Information

this script monitors the health of the ics_dyn process and restarts it if there is a delay detected in the ics_dyn service (it connect to the /aghealth/ url on any of the services that LAG has configured, to detect the health of LAG).
 
The problem we were seeing - at times where a restart is required - is that the service initialization could take longer than 40 seconds. Once this happens the monitor thread comes and restarts the ics_dyn.
 
The /etc/init.d/lagmonitor script calls some additional script files - and the delay detected statement is coming from one such file (laghealthcheck.sh in /chroot/lag/opt/novell/bin directory).