Java process high utilisation on Linux Access Gateway generating Error: "System low on memory. New user requests being rejected!"

  • 7007317
  • 06-Dec-2010
  • 18-Mar-2015

Environment

Novell Access Manager 3.1 Linux Access Gateway
Access Manager 3.1 Support 2 Interim Release 3 applied
SLES9 Based Linux Access Gateway
NetIQ Access Manager 3.2

Situation

SLES9 based Linux Access Gateway setup to protect a high number Web server resources. Out of the 100s of protected resources, many of them have policies enabled that reference LDAP attributes. For performance reasons, the best practices guide outlined at https://www.novell.com/communities/node/9321/how-configure-access-gateway-embedded-service-provider-reduce-access-gateway-load-and-impr was implemented so that all referenced LDAP attributes were sent across in a SAML assertion between the Identity (IDP Server and Linux Access Gateway (LAG) server. The total number of  attributes sent across exceeded 100.

User session timeouts were set very high (18 hours) and each Linux Access Gateway box would have 3000-4000 user sessions enabled at all times.

After the LAG servers were up for a few days, performance from one or two would become sluggish and many users would fail to access protected resources through these LAGs. Looking at the top output on the sluggish LAGs would indicate that java CPU utilisation was running at 100%. ANy policy evaluations or authentication requests needing java would fail. Users would get blank pages, or standard browser connection errors as a result.

Resolution

Upgrade to SLES11 based LAGs and allocate 2048GB to the Java heap in the tomcat5.xml file (https://www.novell.com/documentation/novellaccessmanager31/identityserverhelp/?page=/documentation/novellaccessmanager31/identityserverhelp/data/bpty25o.html). Note that in 3.2, the -Xmx and -Xms should be set to the same value (min 2048GB) due to a problem allocating memory after startup (fixed in post 3.2 builds).

WIth SLES9 based LAGs, the OS could only access ~3GB or RAM. The default 1024 allocation to Java was such that, with the number of user sessions and attributes saved per user, the java process would run out of RAM. By upgrading to the SLES11 based LAGs, the OS has the possibility to access more RAM and therefor an allocation of 2GB to Java addressed the issue.