Tomcat becomes unresponsive and/or CPU remains at 100%

  • 7019816
  • 06-Apr-2016
  • 07-Aug-2017

Environment

Retain 4.x

Situation

Tomcat periodically stops responding - I cannot login to Retain, nothing gets written to the logs, and there just does not seem to be any signs of life; or, I'll see the CPU spike to 100% utilization and just remain there.

Resolution

CPU, disk I/O, and RAM are key components.  And in isolated cases, VMware resource limits have come into play.  Issues with any one of these three key areas or if VMware resource limits have been implemented (some admins have been surprised that they were set) can cause a bottleneck and can adversely affect the performance of your Retain system and could even cause the server to become unrepsonsive, especially indexing and searches.

CPU

See our article: Improving Indexing Performance By Adding Threads And Cores

Disk I/O

See our article: Understanding Disk I/O in Relation to Retain Performance

RAM / Swap Space

Memory is the biggest key.  The indexer creates a MemoryMap on disk and caches the entire index into it.  It will want to put as much of that MemoryMap into RAM as possible.  When it doesn't have enough RAM, we've seen systems become unresponsive.  On Linux systems, we'll see the load average skyrocket to 112.  Bottom line:  Increase your system's RAM.  Depending on the size of your indexes ([base storage directory]/index/solrhome/retaincore/data/index), 32 GB of RAM may not be enough.  Give it as much as you can.

If you can't increase RAM, we have found that increasing the swap space available to the server can help prevent the issue where tomcat becomes unresponsive, but cannot guarantee that. The following virtual (swap) memory size recommendations are preliminary, as we have not isolated all variables but this will improve matters in many cases.  In Linux, we've seen in /var/log/messages the following statement when swap was the issue:

Apr  2 13:23:26 wapvo-retain01 nagios: SERVICE ALERT: localhost;Swap Usage;CRITICAL;SOFT;1;SWAP CRITICAL - 0% free (0 MB out of 4094 MB)

Dedicate 50-100GB of swap space.

Changing Swap Space on Windows
Changing Swap Space on Linux

VM Resource Limits

See "Retain Tomcat becomes unresponsive after several mailbox searches" for a VMware resource limitation setting that could cause tomcat to become unresponsive.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2780.