SIGPIPE using nscd socket - commands don't seem to run

  • 7003590
  • 19-Jun-2009
  • 08-Nov-2012

Environment


SUSE Linux Enterprise Server 10 Service Pack 2
SUSE Linux Enterprise Server 10 Service Pack 1

Situation

  • Occasionally commands like tar and ls will return to the shell prompt with nothing printed at all to STDOUT or STDERR
  • The return code for these commands is 141
  • strace of a failure shows -1 EPIPE / SIGPIPE / Broken Pipe errors writing to nscd socket
  • nscd appears to be leaking sockets as demonstrated by executing netstat -a | grep /var/run/nscd/socket and seeing hundreds or thousands of results during the problem
  • The symptom is resolved by restarting the ncsd daemon
  • The computer uses LDAP authentication (nss_ldap)
  • /var/log/messages shows several "nss_ldap: reconnecting to LDAP server" messages

Resolution

This is an issue that can be caused by ldap server or network outages.  The problem seems to be in the way that libldap uses libssl.  Unfortunately, with long tcp keepalives (two hours) the ncsd daemon can hang even once the ldap server becomes available again.

The solution is to adjust the tcp keepalive down by modifying net.ipv4.tcp_keepalive_time via sysctl to something much smaller than two hours (such as a few minutes).

Feedback service temporarily unavailable. For content questions or problems, please contact Support.