Environment
NetIQ eDirectory 8.8 SP8 running on RHEL 6.6
Situation
These are some symptoms of this particular problem:
- Replica synchronization reports to some server errors -625 to servers that are reachable.
- If you connect to iMonitor -> Agent Activity, you see some thread take hold of the Write lock and never release
- Requests to the affected server may work or may get stuck.
- The affected server becomes unresponsive.
- After restarting ndsd, the problem is resolved, at least for some time (a few hours or a few days)
- If you use the utility gstack to get a list of the running threads, the lock is released and the server goes back to normal
Resolution
The error -625 indicates that a server failed to respond in a timely manner. There are some other conditions that can also cause this error, like high utilization conditions or when a server tries to write a very large amount of attributes for a particular object. In these scenarios, though, the issue reappears soon after restarting the ndsd process.
This particular problem is caused by a kernel bug in Linux, which affects mostly Red Hat Enterprise Linux 6.6, 7.0 and 7.1 (running kernel versions 2.6.32-504 up to and including 2.6.32-504.12.2), in particular servers on version
For more information from Red Hat:
https://access.redhat.com/solutions/1386323
To avoid this issue, make sure that the latest patches are applied on your Red Hat Linux server and that the kernel version is higher than the ones mentioned above.