Environment
Novell Access Manager 3.1 Linux Access Gateway
Situation
AM 3.1 Howto troubleshoot Linux Access Gateway (LAG) crash or hangs
=====================================================
Functionality: Goal is to be able to verify whether crash or hang occured, and whether it is a known issue
LAG settings required to view crashes or hangs:
1. Make sure your LAG has the same build as that where the crash or hang occured. If customer sent in a crash on 3.1.1 IR3, we need to have 3.1.1 IR3 on our system to view the core
2. copy the coredump generated to /chroot/lag directory of LAG
3. cd to the /chroot/lag directory and run
# chroot /chroot/lag
# gdb opt/novell/bin/ics_dyn <$core.pid>
4. run the backtrace (# bt ) command to determine what code path we went through when crash occured
5. search TIDs for corresponding crashes with same backtrace
Note: the backtrace for 4. above will not show the real backtrace unless the correct symbols for the file exist and are loaded on the LAG server you are using to view the coredump. However, the backtrace will be the same on all servers that do not have the symbols, so adding the backtrace to a TID will give customers the opportunity to see if the defect is known and fixed.
LAG crash touch files:
- /tmp/.dumpcore - will dump the core - this will be very useful and need to be there to dump the core (Minimum 3GB free disk space required)
- /tmp/.forcedumpcore - Forces dumping of core even if below min 3GB free disk space available
Info to request:
- touch files from /tmp/ directory (ls -l /tmp/.* output)
- ics_dyn.log file from /var/log/ directory
- core.<$pid> files from /chroot/lag
- amdiagcfg.sh output for configuration details
- 'netstat -patune' output from LAG
- 'top' output from LAG
What to look for in log files:
1. ics_dyn.log file: Search for the 'ELOG' string that is displayed when a server is starting. Work your way back up the log file to a signal 11 or restart string. If in debug mode, you may get an idea of what was happening when the server crashed eg. in the middle of a formfill operation. This will give a hint to look at the formfill policy applied to the protected resource being accessed.
2. run 'bt' command within gdb and see whether there is a matching TID reporting the same problem
// Internal information
3. download the symbols for the version of the LAG that you are running. The symbols are available from http://builder.provo.novell.com/artifacts/LinuxAccessGateway/AccessManager3.1_SP<$SP_Number>_<$IR_Number>/. Download the ZIP file and retrieve the Sym_Prod.tar.gz file.
4. Copy this Sym_Prod.tar.gz file to the /tmp directory and run 'tar -zxvf Sym_Prod.tar.gz'
5. From within /tmp, run the symcopy.sh file to copy the symbols to the appropriate directories
6. reload the coredump (exit and re-run the gdb command)
7. run 'bt' and check if the backtrace gives any hints regarding the area of code eg. FF/Injection/
8. check if existing defect exists and create one if needed
// end of internal
Useful TIDs:
1. Debugging with gdb - http://www.delorie.com/gnu/docs/gdb/gdb_toc.html
2. Linux Access Gateway crashing in removeFromConnectionList() - https://support.microfocus.com/kb/doc.php?id=7005807&sliceId=1&docTypeID=DT_TID_1_1&dialogID=137411674&stateId=0%200%20137409437
=====================================================
Functionality: Goal is to be able to verify whether crash or hang occured, and whether it is a known issue
LAG settings required to view crashes or hangs:
1. Make sure your LAG has the same build as that where the crash or hang occured. If customer sent in a crash on 3.1.1 IR3, we need to have 3.1.1 IR3 on our system to view the core
2. copy the coredump generated to /chroot/lag directory of LAG
3. cd to the /chroot/lag directory and run
# chroot /chroot/lag
# gdb opt/novell/bin/ics_dyn <$core.pid>
4. run the backtrace (# bt ) command to determine what code path we went through when crash occured
5. search TIDs for corresponding crashes with same backtrace
Note: the backtrace for 4. above will not show the real backtrace unless the correct symbols for the file exist and are loaded on the LAG server you are using to view the coredump. However, the backtrace will be the same on all servers that do not have the symbols, so adding the backtrace to a TID will give customers the opportunity to see if the defect is known and fixed.
LAG crash touch files:
- /tmp/.dumpcore - will dump the core - this will be very useful and need to be there to dump the core (Minimum 3GB free disk space required)
- /tmp/.forcedumpcore - Forces dumping of core even if below min 3GB free disk space available
Info to request:
- touch files from /tmp/ directory (ls -l /tmp/.* output)
- ics_dyn.log file from /var/log/ directory
- core.<$pid> files from /chroot/lag
- amdiagcfg.sh output for configuration details
- 'netstat -patune' output from LAG
- 'top' output from LAG
What to look for in log files:
1. ics_dyn.log file: Search for the 'ELOG' string that is displayed when a server is starting. Work your way back up the log file to a signal 11 or restart string. If in debug mode, you may get an idea of what was happening when the server crashed eg. in the middle of a formfill operation. This will give a hint to look at the formfill policy applied to the protected resource being accessed.
2. run 'bt' command within gdb and see whether there is a matching TID reporting the same problem
// Internal information
3. download the symbols for the version of the LAG that you are running. The symbols are available from http://builder.provo.novell.com/artifacts/LinuxAccessGateway/AccessManager3.1_SP<$SP_Number>_<$IR_Number>/. Download the ZIP file and retrieve the Sym_Prod.tar.gz file.
4. Copy this Sym_Prod.tar.gz file to the /tmp directory and run 'tar -zxvf Sym_Prod.tar.gz'
5. From within /tmp, run the symcopy.sh file to copy the symbols to the appropriate directories
6. reload the coredump (exit and re-run the gdb command)
7. run 'bt' and check if the backtrace gives any hints regarding the area of code eg. FF/Injection/
8. check if existing defect exists and create one if needed
// end of internal
Useful TIDs:
1. Debugging with gdb - http://www.delorie.com/gnu/docs/gdb/gdb_toc.html
2. Linux Access Gateway crashing in removeFromConnectionList() - https://support.microfocus.com/kb/doc.php?id=7005807&sliceId=1&docTypeID=DT_TID_1_1&dialogID=137411674&stateId=0%200%20137409437