Environment
Novell NetWare 6.5
NetWare OpenSSH (SSHD, SFTP)
Situation
SFTP or SSH authentication is slow.
SFTP or SSH authentication times out before completing.
Some users, servers, or volumes are not found.
Sys:etc/ssh/logs/sshd.log might contain timeout errors, similar to:
Search for user brubble in context ou=users,o=flintstones failed with error Timed out, continuing search
edir_build_server_list() Search for servers in context ou=users,o=flintstones failed with error Timed out, continuing search
Resolution
1. This first section describes what is considered the "true solution" to this issue.
During the authentication phase of an SSH or SFTP session, SSHD does LDAP searches of the contexts specified by one or more eDirNameContext= in the sys:etc\ssh\sshd_config file. In the case of SFTP sessions, searches are done for NCP Server objects and Volume objects as well.
Replica servers suffering performance problems can cause SSHD.NLM to time out on eDirectory searches for users or servers. Very large containers or trees could be slow to return results as well. Searches will be fastest if eDirectory indexes exist for the attributes being searched.
The first step is to identify which replica server (or servers) are being queried by the SSHD server. As mentioned above, the eDirNameContext settings in sys:/etc/ssh/sshd_config will specify the contexts being searched. If the server running SSHD.NLM has local replicas containing the contexts it searches, it will be using those local replicas. If it does not have local replicas, it will probably be using a nearby replica server. But determining which nearby replica server is being used (if several exist) may be tricky. One approach would be to index all nearby replica servers, or even all replica servers in the applicable replica rings. Another approach would be that sshd_config can be told exactly which one to use, by pointing it to an LDAP server on a known replica server. KB 3783350 discusses how to point to a particular LDAP server, though there are disclaimers in that TID because it is not a thoroughly tested approach.
Once you identify the replica servers were indexes would be beneficial, you can check or add the appropriate indexes on each of those servers. User search delays and server or volume search delays have to be addressed by separate indexes.
When searching on user objects, timeouts are possible because the uniqueID attribute might not be indexed by default. Whether or not this index already exists may depend upon the eDir version in use. Indexes can be checked or added via ConsoleOne by opening the properties of the server object and going to the Index tab. For fast user searching by SSHD.NLM, the best solution is to have an index for the "uniqueID" attribute. Index rule should be "value".
When searching for NCP server objects or for Volume objects, timeouts are possible because "Object Class" is not indexed by default. For this, it would be best to create a value index on the "Object Class" attribute. (Object Class isn't always thought of as an attribute, but it is considered an attribute for index purposes.)
Keep in mind that even if certain of the partitions being searched do not contain some of these types of objects, it will still speed up the searching to build the indexes. For example, an NCP Server search in a context which only contains users is not expected to return any results; however, the Object Class search still gets performed, and will benefit from having an index present.
2. Additional factors that may be of use as "workarounds."
In a healthy tree with good indexes in all the necessarily places, these LDAP searches should complete very quickly, even if the tree is very large. If after adding the right indexes to the right servers, the searches are still timing out, if may be useful to lengthen the amount of time SSHD is willing to waiting before declaring that the search has timed out. Note that lengthening the time out is just a "band-aid" to the problem. It does not address why the searches are taking abnormally long. Index placement and healthy responsiveness of replica servers should be all that is needed to overcome this issue. However, if there are struggles to overcome it, then the following additional methods may be helpful:
SSHD considers LDAP searches to have timed-out after getting no response for 10 seconds. SSHD can be told to wait longer before it considers a timeout to have occurred. This will not fix slow eDir responses but it will allow SSHD to be more patient with eDir.
Here's steps to try this out:
a. The ability to set the SSHD "ldaptimeout" is relatively new. It was first publicly introduced in NetWare 6.5 SP8. If the system is using SP7 or below, the easiest way to get the updates is to obtain and apply NWsshd8a.zip from support.novell.com (downloads tab).
b. With the new sshd materials installed, a timeout can be configured by adding the following line to sys:/etc/ssh/sshd_config :
ldaptimeout nn
where "nn" can be set to a number of seconds, anywhere from 10 (the default) to 45. Once this is set, it should be put into effect by unloading and reloading SSHD.NLM. Note that while this might allow the searches to finish before a 'timeout" is declared, it could cause ssh or sftp logins to take an excessive amount of time. There are 4 sets of LDAP searches that occur during each SFTP login attempt, and each "set" could be doing more than one search, depending on how many "eDirNameContext" lines there are in the sshd_config file. Which brings up the final points:
c. In sys:/etc/ssh/sshd_config, the "eDirNameContext" settings will influence how many searches are being done, and how big each search is going to be. Often, systems are searching a larger section of the tree than they need to, or are redundant in their searches. For example, consider the following:
eDirNameContext o=organization?scope=subtree
eDirNameContext ou=users,o=organization?scope=subtree
This means that SSHD is doing LDAP searches for o=organization and everything beneath it, then it is also doing another search of ou=users,o=organization and everything beneath that.
Problems and possible resolutions to this are:
i. The second search is already part of the first search. The second line could be removed, and this change put into effect by unloading and reloading SSHD.NLM. This would not stop timeouts from occurring, but it would speed up the overall process because SSHD would do only half as many searches before finally finishing the sequence.
ii. The searches as configured above are "subtree" searches. Such searches can require more time, and they may even be searching through many contexts which are unnecessary. These could be changes to individual (flat) context searches. This is done by removing the "?scope=subtree" from those lines. Then, if necessary, add additional eDirNameContext lines to cover remaining contexts that need to be searched. Reload SSHD to put this into effect.
This could create shorter searches and may allow them to complete without registering a time out. However, if too many additional eDirNameContext lines are needed to cover the individual contexts, this could cause an overall lengthening of the whole process. Each environment may be different in terms of whether it would be more efficient or successful to do one or two large subtree searches, or several small individual context searches.
Additional Information
Formerly known as KB 10101193.