Memory Fragmentation Issue with NetWare 6.0 and 6.5

  • 3920657
  • 11-Oct-2007
  • 08-Feb-2013

Environment

Novell NetWare 6.5
Novell NetWare 6.5 SP1a
Novell NetWare 6.5 SP2
Novell NetWare 6.5 SP3
Novell NetWare 6.5 SP4a
Novell NetWare 6.5 SP5
Novell NetWare 6.5 SP6
Novell NetWare 6.5 SP7
Novell NetWare 6.0 SP5
Novell NetWare 5.1 SP8
View a multi-media tutorial for this TID at: https://support.novell.com/additional/tutorials/tid10091980/

Situation

Server runs out of memory after a few days of normal operation
Short Term Memory Allocation errors
Cache Memory Allocator errors
Server runs out of memory running a backup

Resolution

The steps below outline specific actions to take if your NetWare server is experiencing problems with memory. They should be applied in the order presented for best results. In our experience, nearly all problems with memory fragmentation will be addressed by applying the suggestions in the first 4 steps. Steps 5 and 6 can be followed if there's still trouble, but Novell recommends stopping after Step 4 and letting the server run for a few days before applying Step 5 and (especially) Step 6.

STEP 1: Update your server to the latest NetWare Support Pack

All of the fixes and updates for this issue are incorporated in the latest NetWare Support Packs. Novell strongly recommends that customers experiencing memory fragmentation issues have the latest support pack (from CSP11 or later) installed on their servers. For customers running NetWare 6.5 Support Pack 5, please update the server with the patch NW65SP5UPD1.EXE.

STEP 2: If the module TSAFS.NLM is running on the server

TSAFS can be limited in the amount of cache it requests. To do this, unload the module TSAFS.NLM then re-load it with the following command-line switch:

Load TSAFS /CacheMemoryThreshold=1

Explanation: This is the percentage of the server's free memory that TSAFS will use (as determined at run time). If this is a machine with 4 GB of RAM, the '1' represents 1% or up to 40 MB that TSAFS can allocate and use for its cache; if it is not set it will default to 10% or 400 MB in a 4 GB server. 1% is the lowest that TSAFS will allow for this setting. You must have TSA5UP15.EXE or NetWare 6.5 Support Pack 2 or later to set this parameter to 1%. (With older versions of TSAFS.NLM, the lowest allowed setting was 10%). Let the /ReadBufferSize stay at 64K. This will cause TSAFS.NLM to allocate memory in smaller amounts.

This same change can be made in SYS:\ETC\SMS\TSA.CFG under the parameter "Cache Memory Threshold" and "Read Buffer Size". In either case (loaded with the command line switch, or via the .CFG file change), this setting is persistent and will be in effect the next time the NLM is loaded. These parameters can be set higher if needed, but the lowest settings are the best place to start. It is rare that they would need to be increased.

Note: The older module TSA600.NLM should not be used in place of TSAFS.NLM to address fragmentation problems on the server. Novell no longer supports TSA600.NLM. In our experience, fragmentation issues can be resolved by following the instructions in this document; in most cases just the first 4 steps outlined here.

STEP 3: Set a hard limit on the amount of RAM that DS.NLM uses

Although various recommendations exist for eDirectory cache tuning, the following is a general rule which may need some custom adjustments. Customers should apply eDirectory 8.7.3.6 or higher, earlier versions would not retain the hard limit settings. Please check with other TID, Solutions or documentation that describe how this setting is adjusted.

Do not use dynamic caching. Set a hard cache limit. Do not exceed 1GB on a 4GB system - there is no proven benefit to more than 512 MB on a NetWare server, as the overhead of maintaining a larger cache is greater than the benefit from faster reads and writes. Start with a small cache size (200 MB on a 500 MB DIB) and work higher. Use the iMonitor Database cache hit statistics to determine when a good value has been achieved. When the cache hit for entry and block is not improving there is no further benefit to increasing the cache, you can tweak the balance between entry and block. The cache hard limit may be a lot lower than expected, as eDirectory is efficient at data access.

Explanation: The idea here is that by limiting the amount of RAM owned by DS.NLM, it can leave more for the operating system and other NLMs to use. Additionally, the lower memory for DS forces DS.NLM to flush its memory sooner with newer data, thereby releasing memory alloc nodes sooner.

A Solution that may be used to adjust DS memory consumption is:
 
Other solutions that may be of use are:
 

Although the last solution focuses on high utilization, it speaks of the cache setting for eDirectory in an LDAP heavy environment.

eDirectory 8.7.3 IR6 and beyond include code that allows eDirectory to pre-allocate cache memory from the operating system. With this code addition, memory fragmentation for the server is dramatically reduced. To implement this feature, please consult the eDirectory documentation and see the following solution:

 

STEP 4: Set the File Cache Maximum Size parameter

For customers running NetWare 6.5 Support Pack 4 or greater and have 3 GB or more, use "SET File Cache Maximum Size = 2147483648". For customers using other versions of NetWare, follow the instructions for this step as outlined below.

Explanation: This hidden set parameter (available in the server.exe starting in NetWare 6.5 Support Pack 2 and NetWare 6.0 Support Pack 5) allows for the granular size adjustment of the logical memory pools. It is very important to get this number right; count the digits to make sure you've got 10 digits total. One more or less will make a big difference.

The minimum for this set parameter is normally 1 GB (1073741824) and the maximum is around 3 GB (3087007744) on servers where no other "tuning" for memory has been done. These numbers are not identical on every server, and they can vary depending upon the size of the User Address Space (as set with the "server -u" parameter, described in Step 6 below). If the User Address Space is larger, the maximum number for this set parameter will be smaller; and conversely, if the User Address Space is smaller, then the maximum will be larger.

To display what the server will currently allow for this SET parameter's minimum and maximum settings, type "SET File Cache Maximum Size" at the server console (without specifying a value) and press .

If more memory is needed for the File System cache pool, this parameter can be set to a larger value. Use the minimum setting for a while, though, to verify that the problems with memory fragmentation are addressed first.

New to NetWare 6.5 Support Pack 3 is the set parameter "set auto tune server memory". This parameter is ON by default and willautomatically set the file cache maximum size appropriately for the server during operation. When this feature is enabled, messages will be printed on the system console screen notifying the administrator of this activity. Novell recommends that this parameter stay at its default setting of ON.

STOP HERE and re-boot the server

Most issues with fragmentation of logical memory will be resolved by following the first 4 steps above. Re-boot the server and let it run for a while (a few days, including if possible any automatic back-up jobs and a cycle or two of peak user activity). If necessary, apply the next two steps (one at a time) to make the final adjustment.

Step 5: Set a hard limit on the amount of RAM that NSS can have

For customers running NetWare 6.5 Support Pack 4 or greater,do nothard set NSS memory. Allow NSS to use a cache balanced amount of memory (the default is 85%). For customers using other versions of NetWare, follow the instructions for this step as outlined below.

Important: Novell has released an update for NSS that directly impacts this step. For NetWare 6.0 Support Pack 5 customers, please apply NW6NSS5B.EXE. For NetWare 6.5 Support Pack 2 customers, please apply N65NSS2B.EXE. Customers who are not on the latest support packs (CSP11 or above) are strongly encouraged to move to them so that these updates can be implemented, or to not use the suggestions in this step until this update can be applied. If the suggestions for Step 5 are followed without these updates, NSS can over time starve itself of RAM by giving memory back to the OS. This starvation can lead to server sluggishness, and other performance related issues.

(5a) For NetWare 6.x servers do the following:

In the file c:\nwserver\nssstart.cfg put the following lines:

/nocachebalance

/minbuffercachesize=

Note: If the nssstart.cfg file does not exist, create one using any text editor. Make sure there are no typos in the nssstart.cfg file. If there are, NSS will not load. There are no spaces between the equal sign and the number. If spaces are put in, or the switch is misspelled, NSS will not interpret the command correctly. Double-check the spelling and be sure. If NSS does not have enough RAM, high utilization and severe sluggish performance on the server can result.

Explanation: These settings tell NSS to turn off cache balancing between the OS cache pool and the NSS cache pool, and to allow NSS to allocate only a specific number of cache buffers for file system caching. Each cache buffer is 4096 bytes, so specifying a value of 102400, for example, results in 400 MB of RAM for NSS.

Be aware that even though you may not have NSS volumes on the server, NSS is still loaded and requires RAM to operate, and can cause issues if that RAM is not available. Some customers have made the mistake of removing all of the RAM from NSS because the server has no NSS volumes, only to find that the server experiences high utilization, hangs, etc.

To know how many cache buffers NSS currently has, type "nss /status" at the system console prompt and look for the statistic named Current Buffer Cache Size, at the top of the list. The number in parentheses is the number of cache buffers currently allocated to NSS. Use this number as the starting point for NSS. This number can be decreased if you believe NSS has too much file cache.

Use "nss /cache" at the system console prompt to check the caching performance of NSS on the system. Be sure to give NSS enough memory; however, if memory can be removed (i.e., if the percentages on the caching statistics are high), you can go ahead and give that memory back to the system to be used for other NLMs. (A smaller number for /minbuffercachesize is how to adjust this.)

(5b) NetWare 5.1 customers need only add the two parameters to the load line of NSS in the autoexec.ncf file:

nss /nocachebalance /minbuffercachesize=

NSS has other settings that can also be added to this line and many customers use this load line to implement those settings -- Just make one big load line for NSS.

After making these changes, the server must be restarted for the new NSS settings to take effect.

Suggestions for hard setting NSS memory:

  1. Do not hard set NSS memory below 400 MB (102400 cache buffers). NSS requires memory to operate and if that memory isn't present, it can have a detrimental affect on the server's overall performance. This is important whether or not there are NSS volumes on the server.
  2. Until NetWare 6.5 Support Pack 4, do not hard set NSS memory over 800 MB (204800 cache buffers). If more memory than this is required, use the NSS cache balance feature. Hard setting NSS memory over 800 MB can initially deplete the file system cache pool of memory and temporarily cause "cache memory allocator" error messages on the system console prompt.

STEP 6: Adjust the size of the User Address Space

If Steps 1-5 have been applied, and the server has run for several days or weeks and is still exhibiting signs of logical memory fragmentation, you can alter the default size used for the User Address Space with a server startup command line switch (issued from the DOS prompt or added to the Autoexec.bat line that loads the server). This step, which makes use of a new feature in NetWare Remote Manager (NRM) to get a recommended value for this setting, should be taken ONLY at the time the server is having problems, not when it has been recently re-booted or when it is running smoothly.

Use "server -u" to give the memory configuration just what the server needs for the User Address Space, and not more.

Please be careful with this setting: We have seen customers set this too low and have problems like high CPU utilization, and programs not loading or running in protected memory correctly. Count the digits in the number you're providing in this switch, and then double-check the number, since it can make a dramatic difference in the server behavior.

Included in NetWare 6.5 Support Pack 2 (and later) is a new feature in the NetWare Remote Manager (NRM) that calculates a recommended value for the "server -u" switch, customized specifically to the conditions and activity on the current server. It is important that this value not be calculated when the server is freshly re-booted; the most accurate calculation can be done only after the server has been running for a while, including if possible a period of peak activity and a back-up cycle or any other intensive operation.

To access this configuration help, open up NetWare Remote Manager (logging in as Admin), and click on "View Memory Config" in the left pane of the main window. From there, click on "Tune Logical Address Space." This opens a screen displaying configuration recommendations from the kernel developers at Novell. The recommended settings are calculated specific to the current server's running condition, and include information on how big to set the User Address Space size and the File System Cache Pool. (The NetWare kernel now stores the maximum amount of memory used by these pools over time, and can recommend optimal settings for them.) This will improve how the server uses memory because unused memory in one pool can automatically be given to the correct pool at boot time.

Most customers at this point can and should make the change recommended with the -u setting for server.exe (server -u). The recommended numbers displayed by the NRM utility for the File Cache Maximum Size may seem a little high, but they should nonetheless be applied as shown, if the server has been running for a while. The number calculated for the File Cache Maximum Size represents what the largest logical space will be for the file cache pool; even if the server's physical RAM doesn't match this size. In the case where the RAM installed is smaller than the number displayed, the size will automatically be adjusted to the amount of RAM in the server.

Conclusion

Novell Technical Support has noticed that some customers are using the settings described in the steps above improperly for some configurations. We have included all 6 Steps above in an attempt to describe all possible factors contributing to or aggravating the problem of memory fragmentation. However, the inclusion of all these steps does not mean that every step is recommended for every customer. The steps outlined above should be followed in the order presented.

If you are still having trouble with memory after following these steps

The steps above outline server settings and NLM settings for TSAFS, eDirectory, and NSS, which represent by far the most common candidates for memory tuning.

A very few customers have noticed that over the space of a month or more, the memory on their server steadily declines even after tuning the server with these settings, and the server eventually has to be re-booted. In these cases, it appears that the memory fragmentation problem is not resolved.

Novell recommends that these servers be closely inspected for the use of NLM memory. Usually in a case such as this, the NLMs are consuming a larger amount of memory than the logical mapping space size, and this causes the VM cache pool to borrow memory from the File System cache pool and that leads to the slow and steady decline of memory and the eventual re-boot of the server to reclaim that memory.

Controlling the memory that NLMs can use in the cache pool has proven to be successful with virtually all customers. Steps 2, 3 and 5 above detail three of the most prevalent examples of controlling memory used by specific NLMs. Other modules loaded on the NetWare server, from Novell or from 3rd party vendors, may require scrutiny and adjustments to regulate their role in consuming memory on the server. The NetWare Remote Manager (Module Listing) and other tools can be used to monitor memory consumption over time on a per-module basis on the server.

If further problems and/or questions about this issue arise, please contact Novell Technical Support.

.

Status

Top Issue

Additional Information

NetWare is a 32-bit operating system. Although NetWare can handle up to 64 GB of physical RAM, Intel's 32 bit architecture limits any OS to a 4 GB area for mapping Logical Memory. (The memory above 4 GB must be accessed by mapping pages in and out of the 4 GB space). Because most applications run in the "Kernel Space" or "Ring 0" in NetWare (as opposed to "User Space" or "Ring 3" in other operating systems), all NLMs running in the kernel have a finite amount of RAM to work with.

Depending on the applications and NLMs running on the server, it can become necessary to adjust the settings on servers to reduce the instance and frequency of memory problems.

NetWare 6.5 Support Pack 5 and later, and NetWare 6.0 Support Pack 5, contain the latest fixes and updates specific to the memory fragmentation issue. Novell strongly recommends that customers use the latest support packs for this issue. Additional improvements continue to be made in NetWare's memory management system to better accommodate newer hardware configurations and different combinations of modules running on NetWare servers. Customers running NetWare 6.5 with Support Pack 5 installed should apply NW65SP5UPD1.EXE if they experience memory issues.

Identifying Memory Fragmentation on a NetWare server:

To identify logical memory fragmentation on a NetWare server, Novell has provided statistics via the NetWare Remote Manager (NetWare 6.5 Support Pack 2 and later): Open up NRM (logging in as Admin), click on "View Memory Config" and scroll down to the"Logical Address Space" section.

If the value for "Fragmented Kernel Space" continues to increase over time, or if the percentage of fragmentation (the red wedge/block on the graph) continues to grow, then tuning the server according to the steps outlined below is recommended. If the fragmentation value is constant, however, even if it's 50% for example, as long as it remains steady over several hours or days, the tuning steps below should not be applied, because the server is not being compromised by fragmentation of logical memory.

Any NLM on the server could "leak" memory or otherwise consume excessive memory; this behavior is not the same as fragmenting logical memory, but still warrants investigation. Any of these cases could result in a steady decrease in the number of Cache Buffers reported in Monitor.nlm, and could result in a dramatic decrease in the amount of memory available on the NetWare server, independent of fragmentation.

A memory leak is caused by a module that, due to a coding defect, allocates memory and never returns (or "frees") it to the system. Alternatively, a module may allocate a great deal of memory by design or as a result of programmatic miscalculation. Both of these situations are problematic, although they may not contribute directly to memory fragmentation.

Please also refer to KB 3529262: "How to Read SEGSTATS.TXT" This document is quite useful in troubleshooting memory fragmentation.

Link to TID:  https://support.microfocus.com/kb/doc.php?id=3529262

Using the NetWare Remote Manager (NetWare 6.0 and NetWare 6.5) to diagnose memory problems:

The NRM utility is the best way to watch for memory leaks: Under the Manage Applications heading, click on List Modules, which displays a list of all the NLMs running on the server and amount of memory each NLM is using (broken down into different categories). If you observe a module steadily climbing the list of modules (sorted by memory usage), that NLM is a suspect for a memory leak. To address NLMs with a memory leak, the latest version of that NLM is needed. If the latest version does not fix this issue, please contact Novell Technical Support for further help.

If there are NLMs listed that take up a great deal of RAM, consider the following:

NLM Category of MemoryHow to Deal with It
Peak NLM Memory > 1.5 GBTune Server Memory using the steps outlined below
Peak NLM Memory > 1.0 GBSET File Cache Maximum Size to its minimum (1073741824)
Peak NLM Memory > 700 MBSET File Cache Maximum Size to 1500000000


.

Formerly known as TID# 10091980