Environment
Legato NetWorkr Client version 7.2.1 Patch Level LGTsc02878 Date: 23.3.2007
Netware 6.5 sp7
Situation
After a few days the tape backup starts failing, and we can see an error message in the Save.nlm's main thread:
"9cf5f3a0[SAVE.NLM's main thread ]0001aab9= fffefff7"
in the smdr debug log. After that, the backup doesn't make any progress. The backup will just sit there and never time out. Reloading Networkr in conjunction with TSA/SMDR doesn't help. The Netware server is still working but we have to restart the cluster node to get the backup to reconnect and start working again. The local volumes on the Netware server could still be backed, but not the cluster resources. This problem was intermittent. Sometimes several days would go by before the problem would happen. Coredump showed that there was nothing wrong with the SMS files such as smdr.nlm and tsafs.nlm. They were not hung or in an error condition. They were in fact waiting for work to do. There were no tsafs read threads running. It was as though a backup had never happened, however, this problem would happen right in the middle of the backup. The cluster resources had not failed over to another node but were running fine even after the problem had occurred.
Resolution
LGTsc06219.zip. It contains new versions of savefs.nlm, save.nlm and nsrexecd.nlm. This update is available from Legato.