VMWare ESX Server 3.5
XenApp Presentation Server
- Physical servers running Novell NetWare 6.5
- VMWare ESX server 3.5, running virtualized instances of Novell NetWare 6.5, with the desktop to the end-user being presented via Citrix Metaframe orXenApp Presentation Server.
At irregular intervals the NetWare servers are abending with a Page Fault, and a stack that looks similar as listed below :
Current EIP: 85B7DF35 LIBNSS.NLM|LB_memcpy+15
8002BD4C 86158023 COMN.NSS|COMN_Write+B23
8002BDF8 85E12134 NSS.NLM|MSG_Call+B4
8002BE10 86009C3C NWSA.NSS|ZH_WriteFile64+EC
8002BEE4 86009D05 NWSA.NSS|ZH_WriteFile+55
8002BF10 8BB91F4E NCPIP.NLM|ProcessNCPRequest+8E
8002BF38 00369912 SERVER.NLM|StartWorkToDo+23
8002BF50 0022EB3B SERVER.NLM|kWorkerThread+DF
8002BF68 002285D8 SERVER.NLM|TcoNewSystemThreadEntryPoint+40
When users are working on MS Office Access/Excel files that reside in their user home directory, mostly during file write operations the server abended.As it turns out, the root cause for the problem is to be found in corrupted NCP packets traversing the wire, where the offset for bytes that needed to be written, became out of range. When the server detected the corruption it abended in order to maintain data integrity.
Further troubleshooting has shown that at least on one occasion a Windows 2000 Server that was part of the Citrix farm had a faulty network card, as such causing NCP packet corruption.
Another customer reported that the TCP Segment Offload settings on a network card from one of the Citrix servers were enabled, and after disabling this the abend problems were resolved.
Still another customer had changed the "Minimum NCP TCP receive window to advertise" setting on the server from the default value of 4096 to the maximum value of 16,384. This exposed a TCP Windowing defect where the TCP Receive Window on a connection would go to 0. If the NCP connection was an NCP write, and the server advertised a small window ( For Example: 31 bytes) the workstation would send exactly that number of bytes. If this data was not enough for a complete NCP header it could cause an invalid value for the NCP write. The server would ABEND to avoid data corruption.
A number of troubleshooting methods may be (but are not limited to):
- analyzing LAN traces
- check server NIC Driver statistics for suspecting amounts of retransmits and/or other errors
- check networking components for erroneous amounts of packet (re-)transmits or excessive amounts of failures, etc.
- check for firmwareupdatesfor your infra-structure/networking components that address communication issues
Cisco has released a software update that registered users can download from the Wide Area Application Services (WAAS) area.The customer confirmed that after installing software version v4.0.15 and having full optimization enabled again he no longer encountered the reported problem.