Environment
Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server 11 (OES 11) Linux
NetIQ eDirectory
Novell Open Enterprise Server 11 (OES 11) Linux
NetIQ eDirectory
Situation
On heavily utilized Novell Open Enterprise Servers, over a period of time, the NCP server may become unresponsive once the NDSD process has reached the system's configured open file limits.
Resolution
Assess the server's currently set file- and process limitations, and tune the same to match the server's utilization.
Before modifying any settings, we should determine what the existing system-wide, maximum number of open files are, that can be allocated by the system, and what the current system utilization is.
At a terminal console, type the following :
At a terminal console, type the following :
Verify if a group- or domain wide limit is configured, as follows :
Next, proceed determining the system's in-use file handles :
To determine the current number of system-wide files that are open, type :
To determine the number of files that are opened by specific users, type :
Making the changes :
Before making any changes, please make sure you understand the impact of the changes you plan make to the system. Applying the wrong settings may degrade system functionality and/or destabilize the environment.
When the maximum number of kernel allocatable file descriptors is not sufficient, this can be changed as follows :
When the Group- or Domain wide imposed limit needs to be increased, edit the /etc/security/limits.conf file, check for the 'nofile' line, and increase the new Domain wide value as such :
To change the kernel's maximum number of open files :
Before modifying any settings, we should determine what the existing system-wide, maximum number of open files are, that can be allocated by the system, and what the current system utilization is.
At a terminal console, type the following :
cat /proc/sys/fs/file-max or sysctl -a | grep fs.file-max
(The value displayed here is the maximum number of kernel allocatable file handles/file descriptors)
(The value displayed here is the maximum number of kernel allocatable file handles/file descriptors)
At a terminal console, type the following :
cat /proc/sys/fs/file-nr or sysctl -a | grep fs.file-nr
(The output for the above, will list three columns :
(The output for the above, will list three columns :
i. : Total number of allocated file descriptors since boot.
ii. : Total number of allocated but unused file descriptors.
iii.: The maximum number of kernel allocatable file descriptors)
ii. : Total number of allocated but unused file descriptors.
iii.: The maximum number of kernel allocatable file descriptors)
Verify if a group- or domain wide limit is configured, as follows :
cat /etc/security/limits.conf | grep nofile
Verify the maximum number of open kernel allocatable file descriptors for each of the processes, type :ulimit -n
Next, proceed determining the system's in-use file handles :
To determine the current number of system-wide files that are open, type :
lsof | wc -l
To determine the number of files that are opened by for example the NDSD process, type :lsof -p `pgrep ndsd` | wc -l
To determine the number of files that are opened by specific users, type :
lsof -u <UserId> | wc -l
Making the changes :
Before making any changes, please make sure you understand the impact of the changes you plan make to the system. Applying the wrong settings may degrade system functionality and/or destabilize the environment.
When the maximum number of kernel allocatable file descriptors is not sufficient, this can be changed as follows :
ulimit -n <new_process_wide_number>
When the Group- or Domain wide imposed limit needs to be increased, edit the /etc/security/limits.conf file, check for the 'nofile' line, and increase the new Domain wide value as such :
* soft nofile <new_value>
To change the kernel's maximum number of open files :
sysctl -w fs.file-max=<New_value>
Cause
The system is likely running out of available File Descriptors (or File Handles).
Additional Information
Systems affected by this problem, may display a range of different symptoms on several places :
Also, it needs to be understood that there is a difference between 'open files' and 'open file descriptors (or file handles)', Not all open files are using a file descriptor, such as the program itself, library files, etc.
Let's take for example the ndsd process to verify this statement :
- Novell NCP Clients :
- For NCP clients that have previously established server connections, this may show as problems with for example opening existing, or creating new, files from the network. Performing any existing file operations may also be perceived as progressing extremely slow.
- When NCP clients are trying to create new server connections, this may result into the permission being denied.
- Running ndsstat may fail
Instance at /etc/opt/novell/eDirectory/conf/nds.conf:
node1.OU=clusters.O=Novell.TEST_TREE
Failed to obtain a Novell eDirectory Server connection to node1.OU=clusters.O=Novell.TEST_TREE or Novell eDirectory Server is not running.
node1.OU=clusters.O=Novell.TEST_TREE
Failed to obtain a Novell eDirectory Server connection to node1.OU=clusters.O=Novell.TEST_TREE or Novell eDirectory Server is not running.
- Running an ldapsearch may fail.
node1:~/log # ldapsearch -x -h xx.xx.xx.xx -D cn=admin,ou=master,o=novell -w novell objectclass=user
ldap_result: Can't contact LDAP server (-1)
ldap_result: Can't contact LDAP server (-1)
- The ncpcon utility may not be able to perform any commands
It has to be noted, that while all the symptoms above were found in our test setup, all the corresponding services where also up and running.
Also, it needs to be understood that there is a difference between 'open files' and 'open file descriptors (or file handles)', Not all open files are using a file descriptor, such as the program itself, library files, etc.
Let's take for example the ndsd process to verify this statement :
node1:/ # ps aux | grep ndsd
root 5000 0.1 1.1 703928 94264 ? Sl Mar15 33:17 /opt/novell/eDirectory/sbin/ndsd
node1:/ # lsof | grep 5000 | wc -l
356
node1:/ # ls -l /proc/5000/fd | wc -l
269
node1:/ #
root 5000 0.1 1.1 703928 94264 ? Sl Mar15 33:17 /opt/novell/eDirectory/sbin/ndsd
node1:/ # lsof | grep 5000 | wc -l
356
node1:/ # ls -l /proc/5000/fd | wc -l
269
node1:/ #