eDirectory becomes unresponsive and new connection requests are being rejected by the server

  • 7010014
  • 17-Jan-2012
  • 24-Jan-2013

Environment

Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server 11 (OES 11) Linux
NetIQ eDirectory

Situation

On heavily utilized Novell Open Enterprise Servers, over a period of time, the NCP server may become unresponsive once the NDSD process has reached the system's configured open file limits.

Resolution

Assess the server's currently set file- and process limitations, and tune the same to match the server's utilization.


Before modifying any settings, we should determine what the existing system-wide, maximum number of open files are, that can be allocated by the system, and what the current system utilization is.

At a terminal console, type the following :
cat /proc/sys/fs/file-max  or  sysctl -a | grep fs.file-max
(The value displayed here is the maximum number of kernel allocatable file handles/file descriptors)

At a terminal console, type the following :
cat /proc/sys/fs/file-nr  or  sysctl -a | grep fs.file-nr

(The output for the above, will list three columns :
i.  : Total number of allocated file descriptors since boot.
ii. : Total number of allocated but unused file descriptors.
iii.: The maximum number of kernel allocatable file descriptors)


Verify if a group- or domain wide limit is configured, as follows :
cat /etc/security/limits.conf | grep nofile

Verify the maximum number of open kernel allocatable file descriptors for each of the processes, type :
ulimit -n



Next, proceed determining the system's in-use file handles :

To determine the current number of system-wide files that are open, type :
lsof | wc -l

To determine the number of files that are opened by for example the NDSD process, type :
lsof -p `pgrep ndsd` | wc -l

To determine the number of files that are opened by specific users, type :
lsof -u <UserId> | wc -l


Making the changes :
Before making any changes, please make sure you understand the impact of the changes you plan make to the system. Applying the wrong settings may degrade system functionality and/or destabilize the environment.

When the maximum number of kernel allocatable file descriptors is not sufficient, this can be changed as follows :
ulimit -n <new_process_wide_number>

When the Group- or Domain wide imposed limit needs to be increased, edit the /etc/security/limits.conf file, check for the 'nofile' line, and increase the new Domain wide value as such :
*            soft    nofile        <new_value>

To change the kernel's maximum number of open files :
sysctl -w fs.file-max=<New_value>



Cause

The system is likely running out of available File Descriptors (or File Handles).

Additional Information

Systems affected by this problem, may display a range of different symptoms on several places :
  • Novell NCP Clients :
    • For NCP clients that have previously established server connections, this may show as problems with for example opening existing, or creating new, files from the network. Performing any existing file operations may also be perceived as progressing extremely slow.
    • When NCP clients are trying to create new server connections, this may result into the permission being denied.  
  • Running ndsstat may fail
Instance at /etc/opt/novell/eDirectory/conf/nds.conf:
node1.OU=clusters.O=Novell.TEST_TREE

Failed to obtain a Novell eDirectory Server connection to node1.OU=clusters.O=Novell.TEST_TREE or Novell eDirectory Server is not running.
  • Running an ldapsearch may fail.
node1:~/log # ldapsearch -x -h xx.xx.xx.xx -D cn=admin,ou=master,o=novell -w novell objectclass=user

ldap_result: Can't contact LDAP server (-1)
  • The ncpcon utility may not be able to perform any commands

It has to be noted, that while all the symptoms above were found in our test setup, all the corresponding services where also up and running.


Also, it needs to be understood that there is a difference between 'open files' and 'open file descriptors (or file handles)', Not all open files are using a file descriptor, such as the program itself, library files, etc.

Let's take for example the ndsd process to verify this statement :

node1:/ # ps aux | grep ndsd
root      5000  0.1  1.1 703928 94264 ?        Sl   Mar15  33:17 /opt/novell/eDirectory/sbin/ndsd
node1:/ # lsof | grep 5000 | wc -l
356
node1:/ # ls -l /proc/5000/fd | wc -l
269
node1:/ #