Environment
Novell Open Enterprise Server 11 (OES 11)
Novell Storage Services (NSS) on Linux
NetIQ eDirectory (on Linux)
Situation
Some eDirectory servers (OES or non-OES) will sporadically become non-responsive (aka "lock up"). A packet trace and/or ndstrace shows many name resolve requests occurring and -603 errors seeking uidNumber attribute.
Resolution
- if LUM UID information is sought, and
- if so, how frequently it should occur
this information is configured via two set parameters in ncpcon:
UID_UPDATE_ENABLED=[0-2, default=1]
0 = uid update is off
1 = uid update is done periodically
2 = uid update is triggered for immediate update and then set off
e.g. ncpcon set UID_UPDATE_ENABLED=1
UID_UPDATE_PERIOD=[# of hours, 0.5 or greater, default=0.5]
Note: this requires UID_UPDATE_ENABLED=1
NOTE: the default settings above ensure the code works as it always has, and *WILL* need to be modified to correct this problem. Therefore, if you have few updates to trustee assignments, for LUM-enabled users, these would be a good starting value for these parameters:
ncpcon set UID_UPDATE_ENABLED=1
ncpcon set UID_UPDATE_PERIOD=24
If you find that trustee assignments (aka Rights) aren't available in a timely manner, then decrease the UID_UPDATE_PERIOD parameter.
Cause
As indicated above, traces show many
name resolve quests for objects and their uidNumber and correlate to
NSS trustee assignments. Each request requires an ndsd thread on an
eDirectory holder (in order to perform the search for the attribute).
If a server does not hold a replica of an object (i.e. ExRef
object/server), it will forward the request to a replica holding
server. This will utilize an ndsd thread on both of the servers
involved.
The target/replica holding servers will process
these requests as quickly as they can. However, if an abundance of
requests are received, the server can run out of available ndsd
threads (default tuning=128, max=512). At this point, the requesting
server will receive a “server busy” response (seen in LAN trace)
and they will get retried after a slight delay. Any additional
requests will be queued on the requesting server. The number of
queued is displayed in ncpcon
threads output under the Async section → Number of Queued
Requests.
Additional Information
Further info on new settings:
The UID_UPDATE_PERIOD setting is re-read every time the UID_UPDATE_PERIOD expires. So if you change from the default of 0.5 to 24, the new value of 24 will only take effect *after* the 0.5 has expired and the update is triggered.
Background Information:
This situation is most frequently seen on ExRef servers that do not hold
replicas of any objects. As such, these servers need to get eDirectory/ndsd
information from a replica holder. Some telltale signs of this issue are:
LAN traces show thousands of resolve name requests
ndstrace/dstrace logs, with +RSLV +AREQ on the target server shows resolve name for objects and then seeking the uidNumber attribute. A large number of these will result in a -603 (NO_SUCH_ATTRIBUTE) because the user is not LUM enabled. In some cases, the amount of -603 returns can exceed 99%.
The uidNumber reqeusts are generated from servers with NSS volumes; where NSS trustee assignments are made either:
- directly to a user
- to a group that a user is a member
- one eDirectory object and then another user/group is made security equal to the original object