Improper ncp object and replica placement can lead to high utilization and poor performance

  • 3955264
  • 31-Oct-2006
  • 08-Nov-2012

Environment

Novell Edirectory 8.7
Novell Edirectory 8.8
Suse Linux Enterprise Server 9
Server recently added to tree
Server does not hold a replica of it's ncp server object

Situation

Symptoms:
Server in high utilization
Server slow to respond to authentication requests
Servers in replica ring become unresponsive.
Nds thread usage is high.
See additional notes

Resolution

The resolution is to position ncp server objects in appropriate containers that are partitioned to eliminate the need to do so much tree-walking. In the customer's case adding a replica of the org container/partition resolved the issue.

It is still a good idea to add replicas at the bottom level up as needed, but be aware that if the server is responding to many authentication requests and/or has many connections, it is a good idea to ensure you add a replica of where the server object resides first and then proceed from the bottom up.

Additional Information

In this customer's case they had their major replica servers in the Org container. What was strange was each of the servers in this same Org were in separate geographical locations. There was one in Europe, one in the US, and one in Asia. Typically Novell recommends that servers separated by Wan links, especially those with high latency, be partitioned off separately. This would entail placing the ncp server objects in separate containers/partitions.

The problem started after adding the server in Asia. Soon after this server was added into the tree the server started experiencing the problem. This caused some serious network outages. The intended purpose of this server was to become an nds replica server. A common practice when adding replicas is to start at the bottom of the tree and work upwards. This is done to eliminate the need for the system to add all of the subordinate replicas automatically. If there is a network problem or ds issue of some kind this can be more problematic to troubleshoot. Starting at the bottom eliminates this process. This is what the customer was doing and as soon as the issue started occurring they stopped adding replicas. So they never added the replica of the Org container, where the server object was located.

Since this server was placed in a location where it could respond to authentication requests and had most of the replicas where user objects reside it had many connections to it.
The problem started as a result of all of the connections it had, both from users and server to server connections. There is a background process that periodically expires security equivalence information on connections. This is done every 30 minutes (by default). When this happens the server must recalculate the security equivalence vectors of each connection. This is to ensure that any trustee changes are kept current.. As part of this step a check is performed to see if the user is equivelent to the server object (supervisor). So in this customer's case the server did not have a copy of it's own ncp object, so it had to find another server that did. It had to walk across a highly latent Wan link to a server in Europe. So a combination of poor server placement, tree design and replica placement led to the problem.