Environment
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Open Enterprise Server 2 (OES 2) Linux
Symantec AntiVirus 1.0.14-13
Novell Open Enterprise Server 2 (OES 2) Linux
Symantec AntiVirus 1.0.14-13
Situation
A single server, was displaying very high utilization. Upon investigation, it was determined that:
- This server held a replica of the entire tree
- Over 30 OES servers used this server for it's LUM preferred-server.
- query for Unix Workstation object (found)
- query for each LUM-enabled group associated to the workstation (found)
- sub-tree query from the base-name in nam.conf for a particular uidNumber (not found)
- do the following for each LUM-enabled group associated to the Unix Workstation object
- base query the group for members
- base query each member to see if they have the specific uidNumber (not found)
The more LUM-enabled groups and users associated to the Unix Workstation, the greater the volume of queries per server.
(note: for a useful ndstrace log, all ldap information needs to be enabled. To quickly set this on a server, run
ldapconfig set "LDAP Screen Level = all".
When prompted for credentials, use ndsd format. For example, admin.novell).
Resolution
There was a local user with a uid number > 65535 (which is the size of a 16-bit integer). Decreasing this user's uid number to <= 65533 (and ensure any files owned by the uid number are changed to the new uid number -- chown is a good tool to perform this).
Cause
Real time virus scanning (rtvscand) was active on the server. The *real* uidNumber associated to a user in /etc/passwd was 80000 -- which is 0x13380 in hex (or 17bits long). rtvscand was truncating everything over the lowest 16 bits and was searching for 0x3380 in hex (or 14464 in decimal).
Additional Information
The only way to tell which process was making the call for the bad
uidNumber was by trial and error. The first step was to take the wrong
uidNumber (14464) and added 65536 to it until you find a uidNumber in
use (compare to output of getent passwd).
Back tracking, we found that a service created the user with the given uid number. Once we knew that process, we identified there were 2 processes that might call for this uidNumber -- the service and rtvscand. Stopping the service, stopped the queries. However, stopping rtvscand, while letting the original service run, also stopped the queries.
Running ltrace on each service displayed that rtvscand was making a call for the improper uidNumber.
Back tracking, we found that a service created the user with the given uid number. Once we knew that process, we identified there were 2 processes that might call for this uidNumber -- the service and rtvscand. Stopping the service, stopped the queries. However, stopping rtvscand, while letting the original service run, also stopped the queries.
Running ltrace on each service displayed that rtvscand was making a call for the improper uidNumber.