Environment
Novell Access Manager 3.1 Linux Access Gateway
Novell Access Manager 3.1 Support Pack 1 Interim Release 3 applied
Novell Access Manager 3.1 Support Pack 1 Interim Release 3 applied
Situation
Linux Access Gateway upgraded (LAG) from 3.1 SP1 IR2 to IR3a. Upgrade appeared to be successful, but when the LAG was rebooted, or /etc/init.d/novell-vmc services were stopped and started, the proxy would automatically restart once before initialising successfully.
The ics_dyn.log file would include the following restart statement indicating a delay was detected
Apr 7 18:33:48 savnamag1 vmcontroller: AM#504514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM COntroller timer sleep done
Apr 7 18:33:48 savnamag1 vmcontroller: AM#504514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM COntroller timer done
Apr 7 18:33:48 savnamag1 : AM#104500000: AMDEVICEID#ag-: AMAUTHID#0:
AMEVENTID#0: Delay detected.. RESTARTING ICS_DYN : 10653
Apr 7 18:33:54 savnamag1 vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM-0 DOWN, being RESTARTED (Wed Apr 7 18:33:54
AMEVENTID#0: VM COntroller timer sleep done
Apr 7 18:33:48 savnamag1 vmcontroller: AM#504514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM COntroller timer done
Apr 7 18:33:48 savnamag1 : AM#104500000: AMDEVICEID#ag-: AMAUTHID#0:
AMEVENTID#0: Delay detected.. RESTARTING ICS_DYN : 10653
Apr 7 18:33:54 savnamag1 vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0:
AMEVENTID#0: VM-0 DOWN, being RESTARTED (Wed Apr 7 18:33:54
Resolution
Run the following commands on the LAG console:
> chkconfig --del lagmonitor
> cp /chroot/lag/opt/novell/bin/postupgrade.sh /tmp/.
> sed -i -e "/^[[:space:]]*\/etc\/init.d\/lagmonitor [[:space:]]*st/d" /chroot/lag/opt/novell/bin/postupgrade.sh
> cp /chroot/lag/opt/novell/bin/postupgrade.sh /tmp/.
> sed -i -e "/^[[:space:]]*\/etc\/init.d\/lagmonitor [[:space:]]*st/d" /chroot/lag/opt/novell/bin/postupgrade.sh
A new healthcheck was enabled to detect possible LAG hangs in IR3, and the default delay timeout was exceeded during the restart of the system above. This healthcheck should only be enabled on systems running into the hang problems and not on all systems by default. This healthcheck will be removed in future patches.
Additional Information
this script monitors the health of the ics_dyn process and restarts it if there is a delay detected in the ics_dyn service (it connect to the /aghealth/ url on any of the services that LAG has configured, to detect the health of LAG).
The problem we were seeing - at times where a restart is required - is that the service initialization could take longer than 40 seconds. Once this happens the monitor thread comes and restarts the ics_dyn.
The /etc/init.d/lagmonitor script calls some additional script files - and the delay detected statement is coming from one such file (laghealthcheck.sh in /chroot/lag/opt/novell/bin directory).