Environment
Novell Access Manager 3.1 Linux Access Gateway
Situation
Access Manager 3 setup with a Novell Identity server on the
same host as the Admin Console, and a Linux Access Gateway (LAG) to proxy
requests to back end web servers. Authentication and single sign on
worked fine in this environment. Some users however started complaining
about randomly
- not being able to access certain large PDF files through the LAG
- not being able to cleanly download large files through the LAG
- not being able to access applications generating large amount of data back to browser
Forcing HTTP 1.0 to origin server, disabling persistence to browser and Web server would fail to address the issue. The problem would never be experienced going directly to the Web server.
- not being able to access certain large PDF files through the LAG
- not being able to cleanly download large files through the LAG
- not being able to access applications generating large amount of data back to browser
Forcing HTTP 1.0 to origin server, disabling persistence to browser and Web server would fail to address the issue. The problem would never be experienced going directly to the Web server.
Resolution
Run "ethtool -K eth0 tso off" to disable TCP segmentation offload to
the NIC. It would appear that the NIC driver that shipped with SLES9
SP3 (base OS for Linux Access Gateway) has an issue with the handling
of the TCP segments. As soon as we did , everything started working
perfectly.
The network driver was bnx2. NIC is Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X. H/W is HP ProLiant DL380 G5 Dual Core X5160 3GHzx1 1x4MB L2 2GB. The issue has also been seen with multiple newer LAN drivers that all offer on board TCP checksumming.
From /var/log/messages,
May 4 11:10:25 PIOJPPSNACMSU3 kernel: bnx2: module not supported by Novell,
setting U taint flag.
May 4 11:10:25 PIOJPPSNACMSU3 kernel: Broadcom NetXtreme II Gigabit
Ethernet Driver bnx2 v1.3.29 (October 6, 2005)
May 4 11:10:25 PIOJPPSNACMSU3 kernel: eth0: Broadcom NetXtreme II BCM5708
1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f8000000
The LAG shipping with 3.1 Support Pack 2 will be based on SLES 11. SLES 11 will not have the same issue, so the above workaround will not be required. The problem only exists between the newer NICs and the older SLES 9 SP3 kernel.
The network driver was bnx2. NIC is Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X. H/W is HP ProLiant DL380 G5 Dual Core X5160 3GHzx1 1x4MB L2 2GB. The issue has also been seen with multiple newer LAN drivers that all offer on board TCP checksumming.
From /var/log/messages,
May 4 11:10:25 PIOJPPSNACMSU3 kernel: bnx2: module not supported by Novell,
setting U taint flag.
May 4 11:10:25 PIOJPPSNACMSU3 kernel: Broadcom NetXtreme II Gigabit
Ethernet Driver bnx2 v1.3.29 (October 6, 2005)
May 4 11:10:25 PIOJPPSNACMSU3 kernel: eth0: Broadcom NetXtreme II BCM5708
1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f8000000
The LAG shipping with 3.1 Support Pack 2 will be based on SLES 11. SLES 11 will not have the same issue, so the above workaround will not be required. The problem only exists between the newer NICs and the older SLES 9 SP3 kernel.
Additional Information
Traces would indicate that not all the requested info from the Web
server was being sent back to the browser. For some reason, the
response from the Access Gateway to the browser would be missing TCP
data, and tcpdump would report TCP checksum errors when viewed in
wireshark (not sure if this was a wireshark decode error).
Running ethtool -S eth0 to dump the stats, we could see some transmit and receive errors increment on a regular basis on the NIC connecting the proxy to the browser. Suspecting the NIC, we replaced it to find that we could no longer duplicate the issue.
Running ethtool -S eth0 to dump the stats, we could see some transmit and receive errors increment on a regular basis on the NIC connecting the proxy to the browser. Suspecting the NIC, we replaced it to find that we could no longer duplicate the issue.