SendBroadcastPing: sendto() err:32, conn:xxx errno:32, errmsg:Broken pipe

  • 7007885
  • 13-Feb-2011
  • 04-Sep-2012

Environment

Novell eDirectory 8.8 for Linux
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 1
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3

Situation

Symptoms:
  • ndsd experiencing high utilization
  • "SendBroadcastPing: sendto() err:32, conn:xxx errno:32, errmsg:Broken pipe" keeps repeating in the ncpserv.log
  • ncpserv log growing at a very fast rate
  • Server becomes unresponsive
 
 
 

Resolution

This issue is resolved by applying May 2011 maintenance patches or later,  Best option is to ensure the server is registered through Novell Customer Center and then apply latest patches.  This issue has not been observed in OES11.

Additional Information

Was able to isolate this problem by grabbing output from ps, gstacks and ncpserv.log. For more information troubleshooting ndsd becoming unesponsive on OES, see TID 7007332
ps -C ndsd -L -o pid,tid,nlwp,pcpu,pmem,vsz,stat
Output from this command: (The bolded line (thread id 4179) is the problem thread)
PID TID NLWP %CPU %MEM VSZ STAT
 
4132 4132 86 0.0 2.4 693756 Sl
4132 4164 86 0.0 2.4 693756 Sl
4132 4166 86 0.0 2.4 693756 Sl
4132 4167 86 0.0 2.4 693756 Sl
4132 4169 86 0.0 2.4 693756 Sl
4132 4170 86 0.0 2.4 693756 Sl
4132 4171 86 0.0 2.4 693756 Sl
4132 4172 86 0.0 2.4 693756 Sl
4132 4173 86 0.0 2.4 693756 Sl
4132 4174 86 0.0 2.4 693756 Sl
4132 4175 86 0.0 2.4 693756 Sl
4132 4176 86 0.0 2.4 693756 Sl
4132 4179 86 27.6 2.4 693756 Rl
4132 4181 86 0.0 2.4 693756 Sl
4132 4187 86 0.0 2.4 693756 Sl
4132 4188 86 0.0 2.4 693756 Sl
4132 4314 86 0.0 2.4 693756 Sl
4132 4315 86 0.0 2.4 693756 Sl
4132 4316 86 0.0 2.4 693756 Sl
4132 4317 86 0.0 2.4 693756 Sl
4132 4318 86 0.0 2.4 693756 Sl
4132 4319 86 0.0 2.4 693756 Sl
4132 4320 86 0.0 2.4 693756 Sl
4132 4321 86 0.0 2.4 693756 Sl
4132 4322 86 0.0 2.4 693756 Sl
4132 4326 86 0.0 2.4 693756 Sl
4132 4327 86 0.0 2.4 693756 Sl
4132 4328 86 0.0 2.4 693756 Sl
4132 4391 86 0.0 2.4 693756 Sl
4132 4469 86 0.0 2.4 693756 Sl
4132 4850 86 0.0 2.4 693756 Sl
4132 4851 86 0.0 2.4 693756 Sl
4132 5055 86 0.0 2.4 693756 Sl
4132 5057 86 0.0 2.4 693756 Sl
4132 5061 86 0.0 2.4 693756 Sl
4132 5062 86 0.0 2.4 693756 Sl
4132 5063 86 0.0 2.4 693756 Sl
4132 5064 86 0.0 2.4 693756 Sl
4132 5065 86 0.0 2.4 693756 Sl
4132 5066 86 0.0 2.4 693756 Sl
4132 5067 86 0.0 2.4 693756 Sl
4132 5282 86 0.0 2.4 693756 Sl
4132 5283 86 0.0 2.4 693756 Sl
4132 6013 86 0.0 2.4 693756 Sl
4132 6920 86 0.0 2.4 693756 Sl
4132 7201 86 0.0 2.4 693756 Sl
4132 8048 86 0.0 2.4 693756 Sl
4132 22148 86 0.0 2.4 693756 Sl
4132 9246 86 0.0 2.4 693756 Sl
4132 7601 86 0.0 2.4 693756 Sl
4132 18933 86 0.0 2.4 693756 Sl
4132 19073 86 0.0 2.4 693756 Sl
4132 19085 86 0.0 2.4 693756 Sl
4132 19841 86 0.0 2.4 693756 Sl
4132 21351 86 0.0 2.4 693756 Sl
4132 2865 86 0.0 2.4 693756 Sl
4132 11750 86 0.0 2.4 693756 Sl
4132 11853 86 0.2 2.4 693756 Sl
4132 15395 86 0.0 2.4 693756 Sl
4132 30187 86 0.0 2.4 693756 Sl
4132 30318 86 0.0 2.4 693756 Sl
4132 3448 86 0.0 2.4 693756 Sl
4132 3697 86 0.0 2.4 693756 Sl
4132 8913 86 0.0 2.4 693756 Sl
4132 15140 86 0.0 2.4 693756 Sl
4132 20276 86 0.0 2.4 693756 Sl
4132 20305 86 0.0 2.4 693756 Sl
4132 21910 86 0.0 2.4 693756 Sl
4132 21933 86 0.0 2.4 693756 Sl
4132 23395 86 0.0 2.4 693756 Sl
4132 24450 86 0.0 2.4 693756 Sl
4132 26244 86 0.0 2.4 693756 Sl
4132 26246 86 0.0 2.4 693756 Sl
4132 26247 86 0.0 2.4 693756 Sl
4132 26268 86 0.0 2.4 693756 Sl
4132 26287 86 0.0 2.4 693756 Sl
4132 26312 86 0.0 2.4 693756 Sl
4133 26318 86 0.0 2.4 693756 Sl
4132 26323 86 0.0 2.4 693756 Sl
4132 26325 86 0.0 2.4 693756 Sl
4132 27005 86 0.0 2.4 693756 Sl
4132 27006 86 0.0 2.4 693756 Sl
4134 27065 86 0.0 2.4 693756 Sl
4132 27184 86 0.0 2.4 693756 Sl
4132 27209 86 0.0 2.4 693756 Sl
4132 27212 86 0.0 2.4 693756 Sl
 
 
 
You can see that all the other threads are near 0 percent of the cpu except thread 4179, which is at 27.6 percent. Another hint is the state, which shows all the other threads in a sleeping (Sl) state, while this thread is in a running (RI) state.
 
To get more information on this thread, we can look at the gstack output that was taken at the same time. Searching for “LWP 4179” we can find the stack trace, which looked like this:
Thread 74 (Thread 0x4170f940 (LWP 4179)):
#0 0x00002b13e1b7d18c in sendto () from /lib64/libpthread.so.0
#1 0x00002aaaaaae4d6c in NCPServer::SendBroadcastPing(int, char, unsigned int) () from /opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2 0x00002aaaaaaedb77 in PingL1Oplock(int, int, unsigned int, CacheEntry*) ()
#3 0x00002aaaaaaf3ff7 in HandleLeaseSignal(int) ()
#4 0x00002aaaaab076fd in DircacheMaintenanceEvent() ()
#5 0x0000000000414561 in PoolWorker(void*) ()
#6 0x00002b13e1b76193 in start_thread () from /lib64/libpthread.so.0
#7 0x00002b13e1e9ff0d in clone () from /lib64/libc.so.6
 
In this case the thread was checking the status of an open file and the “sendto” function did not know how to interpret the result code it was getting back. Since we can see the library in #1 was referencing libncpengine, we know this thread has something to do with filesystem ncp. Turning on full ncp logging as described in #7 above, e.g. “ncpcon log everything”, we saw the following in /var/opt/novell/log/ncpserv.log:
 
[! 2011-02-11 09:50:53] SendBroadcastPing: sendto() err:32, conn:555 errno:32, errmsg:Broken pipe
 
We could see this message scrolling over and over and was actually growing the ncpserv.log by about 100mb every 5 minutes! Anyway, based on this message we could see the error was coming from connection 555. Typing “ncpcon connection 555” provided more specific information about that connection, such as the workstation IP, the user who was logged in, the files that were open, etc.