eDirectory connection not clearing on a Linux server after abnormal workstation shutdown

  • 3138614
  • 17-Apr-2007
  • 07-Jun-2013

Environment

Novell Open Enterprise Server 1 (OES 1) Linux
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 1
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell eDirectory 8.8 for Linux
Novell eDirectory 8.7.3 for Linux

Situation

If there is an active connection to a Linux eDirectory server and for some reason the connection is lost due to a workstation crashing or power outage, the connection will remain for the default timeout (~12-15 min) before the connection is cleared.

If concurrent connections is set to 1, either the connection will need to be manually terminated or the user will need to wait for the estimated timeout before they can log back in.

This problem only occurs on Linux. NetWare does not have this issue.

Resolution

This is due to the fact that when a connection is not closed cleanly, the watchdog process is responsible for cleaning it up. So until the watchdog clears the connection the user can't log in if concurrent connection is set to 1. The reason why this is not an issue on NetWare is because the NetWare connection manager does not store the port on the user connection. So if the user logs in from the same address the port is ignored and NetWare treats the incoming connection as the same one.

Fixing this would involve architectural changes that would not be fixable in the 8.7.3.x code base. It is possible that it will be addressed in a future release of eDir 8.8 or later.

There is a workaround that you can implement at the TCP level. The Linux kernel provides 3 parameters to change the way keepalive probes work from the server side.

These parameters are available in the /proc/sys/net/ipv4/ directory.

tcp_keepalive_time: default is 7200 secs or 2 hrs
tcp_keepalive_probes: Number of probes, default value is 9
tcp_keepalive_intvl: how long to wait for a reply on each probe, default is 75 secs. So 9 probes with 75 seconds each will take approximately 11 minutes.

You will have to change the above three parameters such that it does not generate a lot of extra network traffic and still solves the problem.

test modifications to the tcp stack

A simple test modification could be as follows (a 3-minute detection time):

sysctl net.ipv4.tcp_keepalive_time=120
sysctl net.ipv4.tcp_keepalive_probes=3
sysctl net.ipv4.tcp_keepalive_intvl=20

Please be careful that you are not too aggressive with the setting or else you may start to clear perfectly valid connections. Once you modify these files, the settings take affect immediately. There is no need to restart any services. However, the settings are only valid for the current session. Once the server is rebooted, the settings will resort back to the defaults.

making the change permanent

Add the settings to  /etc/sysctl.conf

net.ipv4.tcp_keepalive_time=120
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=20

Additional Information

For later versions of Novell Open Enterprise Server than listed in the environment section here, "TID 7004848 - Usage of the FIRST_WATCHDOG_PACKET NCP parameter on OES Linux" contains additional information and would be more appropriate.

Change Log

Dec 6, 2010 - Modified listed sysctl command line.
Jun 7, 2013 - Modified environment & added 'Additional Information' section.