Novell Edirectory 8.7
Novell Edirectory 8.8
Suse Linux Enterprise Server 9
Server recently added to tree
Server does not hold a replica of it's ncp server object
Symptoms:
Server in high utilization
Server slow to respond to authentication requests
Servers in replica ring become unresponsive.
Nds thread usage is high.
See additional notes
The resolution is to position ncp server objects in appropriate
containers that are partitioned to eliminate the need to do so much
tree-walking. In the customer's case adding a replica of the
org container/partition resolved the issue.
It is still a good idea to add replicas at the bottom level up
as needed, but be aware that if the server is responding to many
authentication requests and/or has many connections, it is a good
idea to ensure you add a replica of where the server object resides
first and then proceed from the bottom up.
In this customer's case they had their major replica
servers in the Org container. What was strange was each of
the servers in this same Org were in separate geographical
locations. There was one in Europe, one in the US, and one in
Asia. Typically Novell recommends that servers separated by Wan
links, especially those with high latency, be partitioned off
separately. This would entail placing the ncp server objects
in separate containers/partitions.
The problem started after adding the server in Asia. Soon
after this server was added into the tree the server started
experiencing the problem. This caused some serious network
outages. The intended purpose of this server was to become an
nds replica server. A common practice when adding replicas is
to start at the bottom of the tree and work upwards. This is
done to eliminate the need for the system to add all of the
subordinate replicas automatically. If there is a network
problem or ds issue of some kind this can be more problematic to
troubleshoot. Starting at the bottom eliminates this
process. This is what the customer was doing and as soon as
the issue started occurring they stopped adding replicas. So
they never added the replica of the Org container, where the server
object was located.
Since this server was placed in a location where it could
respond to authentication requests and had most of the replicas
where user objects reside it had many connections to
it.
The problem started as a result of all of the connections it
had, both from users and server to server connections. There
is a background process that periodically expires security
equivalence information on connections. This is done every 30
minutes (by default). When this happens the server must
recalculate the security equivalence vectors of each
connection. This is to ensure that any trustee changes are
kept current.. As part of this step a check is performed to
see if the user is equivelent to the server object
(supervisor). So in this customer's case the server did not
have a copy of it's own ncp object, so it had to find another
server that did. It had to walk across a highly latent Wan
link to a server in Europe. So a combination of poor server
placement, tree design and replica placement led to the
problem.