Novell NetWare 6.5
HP Data Protector tape backup software
Novell GroupWise 8
HP Data Protector software was running on a Windows host. The NetWare client piece was running on the NetWare server. The backups would proceed fine and then suddenly stop at various places in the backup. Lan traces indicated that the problem was not on the Backup Server side but on the Backup Client side. It just stops sending data to the backup server on the Windows host. There were no TCP errors at all other than for some reason the agent on the NetWare server just stops sending data. The TCPIP connection between the two hosts was not RST. Data Protector labs determined that packets via the socket connection between the bdanet on the NetWare host and the bmanet on the WINDOWS host appear to be dropped.
If pktscan was running on the NetWare server at the time of the backup, the problem did not occur. If the NetWare server was taken down to 1 processor, the problem also did not occur. This seemed to indicate a timing problem.
A NetWare set parameter:
set bsd socket default buffer size in bytes
was increased to 256k and in the HP software buffer size as well. Default size for HP is 64k. The default buffer size for this set parameter is 32768 bytes. The problem only seems to happen when backing up many gigs of data. If this parameter is set to the default or 64k, the problem would happen.
The easiest way forward would be to increase the setting to at least 256 KB, so that the common DP block sizes can be handled without issues. A small buffer for additional overheads would ensure that there is no chance of the problem cropping up.
If GroupWise Windows clients experience a slowness when switching between folders lower the value from 256k in increments of 32k until this is no longer an issue. 192k may be more ideal.
A coredump showed that all the HP data Protector threads on the NetWare server had all been put to sleep. None were running. That was why the backup stopped.