-625 eDir error during sync

  • 7010092
  • 01-Feb-2012
  • 27-Apr-2012

Environment

Novell eDirectory
Novell NetWare
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3

Situation

Replica not syncing.
 
ndstrace shows the following error:
3907824544 SKLK: [2012/01/27 12:02:32.342] DCRequest failed, transport failure (-625).
 
A LAN trace taken at the same time on the server reporting the -625 showed an NCP Destroy Connection request being sent to another replica holder and that request received a TCP RST in reply.
 
The TCP RST packet looks like it is from the other server which resides on another subnet, but the IP header shows that it originated from the local subnet.
 
The next hop router may be configured to drop idle connections.
(Cisco 5500 ASA devices will by default drop idle connections when they have been idle for an hour.)
 
The problem with a 3rd party device dropping connections is that typically it does so without telling either side of the TCP connection being dropped that it will no longer be valid.  Not knowing that the connection is no longer valid means that the end point devices are unable to clean up their connection table until they try what they believe to be a valid connection and get an error.
 
In the case where the connection is an NCP used by eDir that gets reused after a period of being idle and is TCP RST by a 3rd party device, eDir will report a -625 (TCP Transport error).  That -625 will cause the remote server to be put in the eDir bad address cache when in fact the remote server is up and can be communicated with over TCP.

Resolution

There are a few possible solutions to the problem:
 
1) Ask the 3rd party vender to have the device properly notify the parties on the TCP connection when a connection is dropped.
 
2) Configure the 3rd party device so that it does not drop idle connections
(On a Cisco ASA 5500 for example you would set the Configuration > Firewall > Advanced > Global Timeouts>Connection to 0:0:0 to disable the timeout at the connection level.)
 
 3) Configure the 3rd party device so that it drops the idle connections at a time value that will be greater than either the configured NCP watchdog or the configured TCP watchdog value on the end point server.
 
Note:  TCP watchdogs default to 2 hours on SLES.  NCP watchdogs may have different values depending on OS and version.  On Netware the NCP Watchdog default value is ~10 minutes.  On Linux the NCP watchdog may be disabled or not implemented, etc.
 
4) Configure the NCP watchdog or TCP watchdog to be less than the 3rd party devices timer that drops the connection.
 
Note: Changing the TCP watchdog values effects all TCP connections, not just the NCP connections that eDir uses.

Additional Information

The issue of 3rd party devices dropping idle connections will also effect Novell Client connections.
See TID#7009860 - Novell Client drive mappings disappear after a period of idleness

Watchdogs are simple probs and will not cause significant increase in traffic.