Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 1
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Commvault tape backup software
This error would be returned on certain containers in the nds tree. Others would backup fine. Tests were done with tsatest, which worked fine. All containers were backed up.
The tape backup software had errors in their logs like this:
Galaxy.log?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
32243 b554b6b0 04/14 13:43:53 25650 NWSMTSScanDataSetBegin<edirectory context> returned (90000009), Unknown SMS error - 0x90000009.
32243 b554b6b0 04/14 13:43:53 25650 Failed to scan data set . <edirectory context> Unknown SMS error.
32243 b554b6b0 04/14 13:43:53 25650 Failed to scan . <edirectory context> 100407
The problem only happened on certain containers in the nds tree. If these containers were backed up individually, it worked fine.
There were around 1000 containers in edirectory, in this case. The problem was that CommVault, during their scan of edirectory, was making a new connection for each container in the nds tree.There was a 1024 open files limit on the machine being backed up. This was increased to 5000 using
'ulimit -n 5000'.
'fs.file-max = 5000000' in /etc/sysctl.conf or sysctl -w fs.file-max=5000000
After making these changes it is necessary to restart smdr and edirectory for the changes to take effect.
This should be done after hours if possible or when as few users as possible are on the system. Rcndsd restart will unload and reload directory services.
Rebooting the host will accomplish the same thing.
All containers in the nds tree could be backed up with Commvault after making these changes
If setting the files to 5000 does not resolve the issue, then set it to a higher number. This can happen if the there is a large number of edirectory containers.
CommVault makes a new connection for every NDS container it backs up. To do this, it also opens a socket descriptor for each connection. Socket descriptors are files in Linux. This was a case of running out of open files.