Troubleshooting Cache memory allocator out of available memory.

  • 10024177
  • 1.0.47239376.2480199
  • 03-Jan-2000
  • 11-Feb-2003

Archived Content: This information is no longer maintained and is provided 'as is' for your convenience.

Goal

Troubleshooting Cache memory allocator out of available memory.

Fact

Novell NetWare 4.11

Novell NetWare 4.2

Novell NetWare 5.0

Formerly KB 2924988

Symptom

Cache Memory Allocator Error Messages.

Cache memory allocator out of available memory.

Cache memory allocator exceeded the minimum cache buffer limit.

Short term memory allocator is out of memory.

Error: <X> number of attempts to get more memory failed.

<X> is the number of attempts to free memory (cumulative) and this number will increment each time memory is attempted to be freed.

Cause

NetWare is designed to give a requesting process or nlm more memory than it requests. For example if xyz.nlm request 2 Kb from the OS, NetWare will attempt to give xyz.nlm 4 Kb (a 2x over-commitment). If this initial request fails (multiple times), the cache memory allocator error message comes up. The OS will then attempt to give the requesting nlm or resource the exact amount of memory requested. If the attempts to find a contiguous space (smaller this time) fails, then the error message stating that short term memory allocator is out of available memory is displayed.

These error messages will appear on the console after several attempts to access a contiguous block of memory for program execution. It may appear through a check in monitor.nlm that the server has plenty of cache buffers. This however, doesn't report the true nature of the memory and whether or not it is fragmented; all the cache buffer number does is inform you of the total amount of free memory (no fragmentation is taken into account here). Each memory request by the server, or requesting nlm, needs a contiguous space of memory to work with. If memory is severely fragmented and cannot be collected to make a contiguous piece, then the error messages will occur.

Many times these error conditions are not seen for a couple of days or weeks after the server is first brought up. NetWare will seem to run fine (without errors) for a period of time, and then almost suddenly, these error messasges will occur. This is a condition where memory is actually "leaking" from the server.

The term "leaking" is used because normally the server will be fine for a couple of days. After this time, the error messages occur and do not go away until the server is downed and brought back up. What has happened in the interim is that an nlm or set of nlms has requested memory from the server. After using this memory, the nlm(s) should give it back to the operation system, however, they do not but think they have. Therefore, when running a certain routine in the nlm again, it will ask for the necessary memory to complete the task, not knowing that it already possesses it. The operating system will give it the requested memory, and this cycle will continue. Depending upon the amount of memory requested each time, the memory leak may be slow or fast. And depending upon how much memory the server has, the error messages may display after a day or longer.

Some of the problems with memory leaks have been attributed to Novell's clib. These have all been addressed as far as Novell is concerned with the latest clib available. If the latest clib is not installed on the server, this would be a priority. Other problems have been attributed to the actual nlm's. Most of the nlm's where leaks have been seen are 3rd party nlm's. The 3rd party vendor can then work with Novell to determine if their nlm is at fault for this condition.

Fix

Check for the following items when troubleshooting this error:

1. Make sure that there are no DOS device drivers in the CONFIG.SYS and AUTOEXEC.BAT files on the server. With DOS drivers loaded, they can take away from NetWare's memory pool and cause this error message to occur. Remove them if they exist. A good solution to this would be to rename the CONFIG.SYS file to "CONFIG.BAK". An AUTOEXEC.BAT file is fine, as long as it does not load MSCDEX.EXE or any other commands. A command to start up server.exe is fine.

If there is a DOS memory manager, NetWare's loader will not query the BIOS for the amount of memory the server holds. What happens is that it will ask the DOS memory manager for the amount of memory on the machine. If the memory manager reports memory incorrectly to LOADER.EXE, the NetWare will have less memory to work with. This is a perfect example of why NOT to use HIMEM.SYS or EMM386.EXE in the CONFIG.SYS file on a server's DOS partition.

2. Memory registration. If memory is registered manually in NetWare, the error messages listed above are possible. What happens is that NetWare needs contiguous memory and by registering memory manually, this breaks up the continuity of the memory. Manual registration can be seen with a "register memory" statement in the STARTUP.NCF file or AUTOEXEC.NCF file. With NetWare 4.x AND 5.x, memory registration should be handled automatically by the server. Apply the latest OS patch kit (410ptX.exe, NW4SPX.exe, NW5SPX.EXE - depending on the operating system used). If after applying the patch, the memory doesn't register automatically, make sure the BIOS is the latest. With these patches, there is a change to SERVER.EXE; this change allows SERVER.EXE to make a query on INT 15 subfunction E8 and basically says, "BIOS, how much memory do you have?" The BIOS returns a number to NetWare and that amount of memory is automatically registered. Obviously if the BIOS cannot return the value correctly this would be a problem - therefore, make sure the BIOS is updated.

3. If there are many deleted files on the volume that have not been purged, this can consume a great deal of memory in the volume mount (FAT and hashing tables). In NetWare 4.x check this condition by using servman.nlm | Volume Information. Highlight a particular volume and press <ENTER>. A screen will be displayed with a list of parameters and settings. Look for the row that says "Freeable limbo blocks". The number to the right of this description multiplied by the block size is the amount of space that can be gained from a complete volume purge. In NetWare 5 check this condition by using MONITOR.EXE

If the above three fixes do not fix the problem or if there are no drivers in the DOS configuration files and memory is not being registered manually, it is possible that a memory leak is causing NetWare to become low on available memory and report that it is in fact out of memory. The following steps will help determine which module(s) is "leaking" memory from the server:

1. Load ALL of the latest patches on the server. This is obvious, but MANY people do not do it. Do not skip this step. Go out to https://support.novell.com to the minimum patches page and get the latest patches for the server from this site. This web page lists the patches by operating system, so it should be relatively easy to identify which patches go with which OS.

2. Restart the server to get a fresh memory map and to be sure everything has been loaded cleanly. This is a priority because this state of the server will be used as a baseline for memory usage. Therefore, all of the memory needs to be freed and then assigned again; to do this, NetWare needs to be restarted.

3. Load monitor.nlm and go into Memory utilization. This option will list every nlm loaded on the server under the heading "System Modules". The arrow keys are used to navigate up and down the list of nlms and the <ENTER> key can select a specific nlm for a closer look.

Again in monitor.nlm go into Resource utilization. Listed on the screen displayed are the tracked resources for NetWare. Highlight the second option down (Alloc Memory) and press enter. A list of every resource tag that has requested memory from the server will be displayed. Press <ENTER> on a resource tag to display memory in use. This list will also include SERVER.NLM and all of its resource tags. The recording steps listed below for determining a leak with specific resources are the same as with specific nlms. The differences in displayed statistics should be self-explanatory.

4. Usually, customers will have an idea of which programs may be responsible for a memory leak. This helps narrow down the list of nlms to look at. Highlight an nlm that may be a suspected leak, and press <ENTER>. This will display how much of the system ram
this specific nlm is using under the category "Memory bytes in use". Write this number down. This is the baseline number for memory usage on this specific nlm. Repeat this process for other suspected nlms, or all of them if the situation warrants. If there are no nlm's that seem to warrant themselves as suspects, guess, or record all of them.

5. Each nlm selected also lists "memory bytes free". This statistic identifies how much memory the nlm has requested to use, but is actually not using at the time. By pressing <F3>, a manual garbage collection on the free memory may actually give the operating system some of the memory initially requested by the nlm but not being used currently. There are some nlms that will readily free some of their memory after pressing <F3>, and there are others that will not release any. Do not be alarmed if no memory is released, this may be normal and it means that this specific nlm requires this memory for proper operation. It is a good idea to see if modules have requested memory and then do not need it anymore, so run this check on nlm's as a sanity check on the operating system garbage collection routine which should (by default) run every 15 minutes.

Also, if the garbage collection routine does not free the memory in the nlm and the <F3> does, do not be alarmed, this may be normal. The garbage collection routine is designed to use freed memory and place it back in a contiguous space for use with the OS again. If it cannot successfully free the memory and incorporate it back in, it will leave it with the nlm.

6. After each suspected nlm is baselined, then wait for the error messages to occur stating "cache memory allocator ...", etc. This may take 2 or 3 days, but it will occur if there is a memory leak. When these messages occur, go back into Memory utilization and check the nlms recorded previous against the current ones to see if their memory has grown. Usually what is found is an nlm or two that has increased to a "wild" amount of memory ... this is the leak(s). For instance, nlm xyz.nlm was given 259,045 bytes of memory to use immediately after the server was booted. However, after checking back a couple of days, the same nlm will show that it has 2,234,611 bytes of memory; none of which can be freed. This should be a red flag.

Following these steps is the best way to track down a memory leak. Again, if the memory leak is with a NetWare module, Novell needs to hear about it. If the memory leak is with a 3rd party module, that specific vendor needs to be contacted.

If the above steps do not resolve the problem, it is possible that the server is actually low on memory. Check the memory status on the server by going to monitor.nlm and dividing the Total Cache Buffers by the Original Cache Buffers. Novell recommends that the result be above 70%. If the number is below 50%, this is dangerous, add ram. Novell does not use "virtual disk space" (yet) and if the server runs out of ram, it will abend and data will be corrupted (NetWare will not "swap" to keep the server up). If the result of the number is between 50% and 70%, keep an eye on it (depending upon which number it is closer to, it may or may not be serious).

Summary:
============================
1. Determine which modules are causing the memory leak and either update them, or remove them from the server.
2. Down the server and bring it back up on a regular schedule to force the memory to be released and reassigned. This solution should only be used if the nlm that is causing the problem cannot be fixed.
3. This step is listed in the "troubleshooting section". Remove DOS drivers from the CONFIG.SYS and do not manually register memory - let NetWare take care of this.
4. The existence of a great deal of deleted files can consume memory. Delete them.
5. You may actually be out of memory. Add ram to the server.

.