BorderManager Proxy and Cache Performance and Tuning

  • 3321740
  • 13-Feb-2007
  • 26-Apr-2012

Environment

Novell BorderManager 2.1
Novell BorderManager 3.0
Novell BorderManager 3.5
Novell BorderManager 3.6
Novell BorderManager 3.7
Novell BorderManager 3.8
Novell BorderManager 3.9
Novell NetWare 4.11
Novell NetWare 5.0
Novell NetWare 5.1
Novell NetWare 6.5
Formerly TID 2949807

Situation

BorderManager Proxy and Cache Performance and Tuning

Resolution

This document provides some general performance and tuning guidelines, as well as information to help you better decide how to customize the tuning for your BorderManager Proxy Cache installation.

Each BorderManager installation is unique and used differently. Consequently, many of the tuning suggestions will have to be modified to best meet the needs and requirements of your installation. The following guidelines have been proven in many installations around the world.

---------------------------------------------------------------------------
TABLE OF CONTENTS
---------------------------------------------------------------------------

1. Patches and Drivers

2. Cache Volumes

2.1. Keep Cache Volumes Separate
2.2. Have Multiple Cache Volumes
2.3. Turn Compression Off
2.4. Turn Suballocation Off
2.5. 16k Block Size
2.6. DOS Name Space Only

3. NSS Volumes for Caching

4. Name Resolution Files

4.1. NETDB.NLM /N
4.2. HOSTS
4.3. RESOLV.CFG
4.4. PXYHOSTS.*

5. Server Set Parameters

5.1. Communications

Maximum Physical Receive Packet Size = 4224
Maximum Packet Receive Buffers = 10000
Minimum Packet Receive Buffers = 5000
New Packet Receive Buffer Wait Time = 0.1 sec
Maximum Interrupt Events = 50

5.2. Memory

Garbage Collection Interval = 5

5.3. File Caching

Read Ahead Enabled = on
Maximum Concurrent Disk Cache Writes = 750
Dirty Disk Cache Delay Time = 0.1 sec

5.4. Directory Caching

Dirty Directory Cache Delay Time = 0.1 sec
Maximum Concurrent Directory Cache Writes = 125
Directory Cache Allocation Wait Time = 0.1 sec
Directory Cache Buffer NonReferenced delay = 30 min
Minimum Directory Cache Buffers = 1000
Maximum Directory Cache Buffers = 4000
Maximum Number of Internal Directory Handles = 500

5.5. File System

Immediate Purge of Deleted Files = on
Enable File Compression = off

5.6. Locks

Maximum File Locks = 100000

5.7 Disk

Enable Hardware Write Back = on
Enable Disk Read After Write Verify = off

5.8. Miscellaneous

Worker Thread Execute In A Row Count = 15
Pseudo Preemption Count = 200
Minimum Service Processes = 500
Maximum Service Processes = 950
New Service Process Wait Time = 0.3 sec * **(see Notes at bottom of TID)***

5.9. Additional SET parameters

SET TCP IP MAXIMUM SMALL ECBS = 512 - 65534 (1024 default)
SET NCP EXCLUDE IP ADDRESSES=

6. NetWare Administrator (NWADMN32) Settings

6.1. Maximum Hot Unreferenced Time = 30
6.2. Cache Hash Table Size = 256
6.3. Maximum Number of Hot Nodes = 50000
6.4. Number of Directories = 128
6.5. DNS Transport Protocol = UDP

7. Memory Considerations

---------------------------------------------------------------------------
1. Patches and Drivers
---------------------------------------------------------------------------

Get the latest patches and drivers. Especially be concerned with getting the current NetWare and BorderManager Support Packs, CLIB drivers, LAN drivers and disk drivers. Many of the files you need can be found at https://support.novell.com/ You may need to go to your hardware vendor's web site to get the latest LAN and disk drivers.

Whenever applying any new patches or drivers, make sure you read the "readme" that accompanies it. Be familiar with any limitations or special considerations the new files may bring. For example, the readme for a LAN driver may recommend a higher Maximum Physical Receive Packet Size than what is recommended by this document.

---------------------------------------------------------------------------
2. Cache Volumes
---------------------------------------------------------------------------

2.1. Keep Cache Volumes Separate

The cache should always use a volume of its own (nothing else using it). This will help ensure that the maximum amount of space is available for caching. It will help reduce the frequency of cache LRU purging which occurs when available disk space becomes too low. Finally, it will allow you to delete and recreate the cache volume when you want to change the block size.

2.2. Have Multiple Cache Volumes

It is a good idea, especially if you have a large drive array, to have multiple volumes for the cache. For example, if you have an 35 GB array, you may want to break it up into a 4Gb SYS:, 12Gb CACHE1: and 12Gb CACHE2:. It is not recommended to have a cache volume larger than 15 GB.

You define the cache directory and volumes in the NetWare Administrator (NWADMN32). Open the details for the server object, open the BorderManager Setup page, select the Caching button, and select the Cache Location tab.

Whatever path you specify will be used on both volumes. Since you only have the cache using the caching volumes you can set the path to the volume root by entering a single slash (/). Then list all your cache volumes in the volume list. After saving these changes by clicking "OK" until you are back at the NDS tree view, unload and reload the proxy.

If the cache volumes are not mounted that are listed in NetWare Administrator BorderManager Setup, Proxy will fail to load.

2.3. Turn Compression Off

Compression is counter productive in an environment where you want instant access to the files, and the files are expected to be short lived. When a cache hit occurs that requires the file to be read from the disk, you don't want the server to have to spend it's time decompressing the file. You also don't want the CPU to waste time compressing files which are going to be deleted and purged when they are refreshed. So, when you create your cache volumes, disable compression.

2.4. Turn Suballocation Off

Suballocation is very CPU intensive when garbage collection occurs on the volume. Instead of simply purging a deleted file, with suballocation the block has to be read in to determine what portion of the block to purge, then rewritten to the disk with a portion of it purged. In a cache server, files are constantly being created and purged, and suballocation will cause garbage collection to be a huge performance drain. So, when you create your cache volumes, disable suballocation.

2.5. 16k Block Size

With suballocation disabled there will be a great amount of wasted space if block size is too large. Your block size should be just bigger than the average cached file size. Start with 8k blocks. Then, after your cache has been running for some time, calculate the average size of your cache files. Set the block size to the next block size above that average size. (This is when you will be grateful you have separate cache volumes that you can simply recreate.)

2.6. DOS Name Space Only

Each cache volume should only have the DOS name space loaded on it. The cache does not use other name spaces. Regardless of the actual name of the object being cached, it is stored under an 8.3 filename created using a hashing algorithm. Loading unnecessary name spaces wastes directory entries and other resources. Do NOT remove LONG namespace from the SYS: volume.

On NetWare 6.0 or 6.5, Removing LONG_NAME Support from a volume is not supported. If LONG_NAME Support removed from a volume then server may abend in various actions from NetWare Remote Manager or iManager like Inventory Report. Novell Core OS Engineering have confirmed this as per BUG# 173762

---------------------------------------------------------------------------
3. NSS Volumes for Caching
---------------------------------------------------------------------------

NSS cache volumes are not supported with Novell BorderManager.

BorderManager FastCache was designed for optimal performance with the traditional NetWare File System (NWFS). FastCache makes low-level calls to NWFS that allow maximum performance and scalability of BorderManager's proxy-cache services. Comparable support for additional file systems would require an all-new proxy engine. Future releases of Novell BorderManager may make use of a cache-optimized file system such as Cache Object Store (COS).

Cache volumes must use NWFS (non-NSS) for optimal performance and reliability, especially in medium to high traffic environments.
This should not pose an issue with clusters, since shared media (which requires NSS) is unnecessary for cache volumes. Cache data is expendable and need not be redundant.

See TID 10082486 for details and tips on converting NSS volumes to NWFS.

Note:

NetWare 6.5 creates SYS: as an NSS Volume. To reduce the Memory used by NSS and give more memory to your traditional CACHE volumes, set the NSS /cachebalance=15, which means 15%. This setting can be the most important setting for speed in the whole proxy tuning TID! This can be set in Server Monitor Console, Server Parameters, Novell Storage Services, NSS Cache Balance Percent. To make this setting persistent, it has to be added to the c:\nwserver\nssstart.cfg file, which nss reads on startup. This will give most of the memory for caching traditional volumes, so use this in dedicated BorderManager servers only.

There is an utility to create Traditional Netware volume for cache, but it is built with Long name space. Use VREPAIR to remove it. Dismount the Volume first, select Vrepair option 2 to remove Long Name Space, and option 1 to repair it.

The Utility is automatically run when installing Bordermanager. It can be run again later. More details in BM38 Main CD, \unsupported\CCRT directory.

---------------------------------------------------------------------------
4. Name Resolution Files
---------------------------------------------------------------------------

4.1. NETDB.NLM /N

BorderManager does not use NETDB except for DNS name resolution. Proxy does it's own name resolution. However, while Proxy is loading it does check NETDB to learn what NETDB has already resolved, and to fill its own Proxy DNS cache. So, when BorderManager is installed it does not create any Unix services handler.

By default, NETDB checks every 10 seconds for the Unix services handler in NDS. This operation calls NDS and creates/deletes a temporary file. If there is no Unix services handler, this process is a waste of valuable resources, and some benefit can be gained by avoiding it. If you are not using NetWare NFS, or Unix Print Services, load NETDB in the AUTOEXEC.NCF file *AFTER LOADING INITSYS.NCF*, and use a /N parameter to avoid that 10 second polling process.

NOTE: NETDB should not be loaded prior to loading TCPIP.NLM. If NETDB.NLM is loaded prior to TCPIP.NLM, the server may appear to stop responding to IP traffic after a while.
NOTE: Since XCONSOLE.NLM autoloads NETDB.NLM, if INETCFG has been configured for Remote Console Access via a telnet connection, then XCONSOLE.NLM will load during INITSYS.NCF (which is in the AUTOEXEC.NCF). If Remote Console Access is required via telnet, AND NETDB must be loaded with the /N switch, then INETCFG cannot be configured to allow Remote Console Access via telnet. Instead, load NETDB /N and XCONSOLE in the AUTOEXEC.NCF after INITSYS.NCF.

Please see the following TIDs regarding NETDB:
2937176 NetDB library loaded without logging into NDS
2936136 NetDB - What is it?

4.2. HOSTS

As proxy loads it fills it's own DNS cache with entries from the HOSTS file. Clean up the SYS:\ETC\HOSTS file to eliminate hosts that you don't access (many hosts are listed as defaults for example purposes). The hosts file should contain at least an entry for the localhost loopback, and the server itself. When the proxy loads it reads this file as part of the process to build its own DNS cache. With some versions of proxy, if this file doesn't exist the proxy will fail to load.

For example, if your hostname is "server1" and your IP address is 10.0.0.5, your HOSTS file may look like:

127.0.0.1 localhost loopback
10.0.0.5 server1

4.3. RESOLV.CFG

Proxy uses the RESOLV.CFG file to know what DNS nameservers to contact for name resolution. Clean up the SYS:ETC\RESOLV.CFG. The file should contain one line for your domain, and one line for each nameserver. Delete any nameserver lines that don't specify a valid nameserver. The nameservers listed here need to be quick, reliable servers.

Proxy checks the servers in this list starting with the first one listed. The nearest, quickest, most reliable server should be listed first. The farthest, slowest, or most unreliable server should be listed last (or eliminated).

4.4. PXYHOSTS.*

Every 10 minutes of operation, the proxy will save a SYS:\ETC\PROXY\PXYHOSTS file. This file is a dump of what it currently has in its DNS cache. The next time proxy loads it reads this file to rebuild the DNS cache in memory. This file is not read during the operating life of the proxy; it is only read when proxy loads.

If you suspect that the DNS cache is becoming corrupt, you may need to unload proxy, delete this file, and reload proxy. But only do that when necessary since much of the DNS lookup time is saved in DNS caching. You don't want the server to have to rebuild this cache.

---------------------------------------------------------------------------
5. Server Set Parameters
---------------------------------------------------------------------------

The file server uses settable parameters to configure features, limit buffers, set timeouts, etc. The way you would tune a server for general file access is very different from the way you would tune a server for caching. With a caching server you want to give as much resource to the cache as possible.

Please remember, this is a place to start and will require customization on your part to fit it to your requirements and configuration.

Some versions of NetWare 4 or 5 may have different maximum or minimum values, or even different tuneable parameters, than what has been listed here. When setting these parameters, especially for NetWare 5, it is recommended that you use Monitor. This will help ensure they get set properly and in the proper place.

NOTE: If you are using NSS volumes for your cache volumes, you will realize a greater benefit by concentrating less on increasing the file and disk cache set parameters mentioned below and more on increasing the NSS parameters.

5.1. Communications

Maximum Physical Receive Packet Size = 4224

This value corresponds to the largest possible packet for the interfaces directly connected to the server. This is the number of bytes used for each packet receive buffer. It also corresponds to the largest size of contiguous memory created for DMA transfers by the network interface card. If you are certain, from information provided by your network interface card manufacturer, that you can safely reduce this then you may save memory by reducing this value. BUT BE CERTAIN. Specifying too small a value can degrade performance and cause other unpredictable behavior. If, for example, you know your card will still function optimally with a maximum physical receive packet size of 1514, you will save 2710 bytes of memory per buffer. If you have 1000 packet receive buffers allocated, that is about 2.6Mb of RAM you would be saving.

Maximum Packet Receive Buffers = 10000

This sets the maximum number of packet receive buffers that can be allocated. Each packet receive buffer is what you have set as the maximum physical receive packet size. So, with the maximum physical receive packet size set to 1514, and this parameter set to 10000, if it ever reaches the maximum, that is 15,140,000 bytes of memory (nearly 15Mb) used for packet receive buffers.

Minimum Packet Receive Buffers = 5000

This is the number of packet receive buffers initially allocated on startup. After setting this, watch the packet receive buffers on your server. If the number of allocated packet receive buffers climbs to some peak beyond this, try increasing this value to the nearest 500 above that peak the next time you reboot the server. If your server never reaches this number of allocated packet receive buffers, try decreasing it by 500. Keep doing this until you have it set to within 500 above the server's peak.

New Packet Receive Buffer Wait Time = 0.1 sec

This is the amount of time the server will wait to see if a packet receive buffer becomes free before it will allocate a new one. In a high performance, caching environment you don't want to wait for this, so decrease it to this minimum.

Maximum Interrupt Events = 50

This is the maximum number of interrupt time events allowed before guaranteeing that a thread switch has occurred. (I have had one customer respond that they set this as high as 170.)

5.2. Memory

Garbage Collection Interval = 5

In a caching environment, waiting too long for garbage collection could cause you to not have memory available when needed, even though it has already been freed. But, setting this too short will cause garbage collection to happen too often and waste CPU time.

5.3. File Caching

Read Ahead Enabled = on

This should already be set by default.

Maximum Concurrent Disk Cache Writes = 750

If you notice that the Dirty Cache Buffers statistic in Monitor climbs and does not decrease itself quickly to a value proportional to the Current Disk Requests statistic, try increasing this value.

Dirty Disk Cache Delay Time = 0.1 sec

This allows the dirty disk cache to be written to disk quicker.

5.4. Directory Caching

Dirty Directory Cache Delay Time = 0.1 sec

This allows the dirty directory cache to be written to disk quicker.

Maximum Concurrent Directory Cache Writes = 125

Add an additional 100 for each additional drive in the array, up to a maximum of 500. If Concurrent Disk requests are unproportionally higher than your Dirty Cache Buffers, try decreasing this parameter. Do not go below 75.

Directory Cache Allocation Wait Time = 0.1 sec

To reduce the delay in allocating directory cache.

Directory Cache Buffer NonReferenced delay = 30 min

In a caching environment, you want the directory cache to stay in memory for a good long time to avoid the possibility of having to go to disk the next time a duplicate request comes through.

Minimum Directory Cache Buffers = 1000

These directory cache buffers are not allocated on startup like the packet receive buffers are. Instead, when a new directory cache buffer is required, if the current allocated directory cache buffers is below this number, the wait time is eliminated. After running for a while, set this to what you determine is your typical peak for used minimum directory cache buffers.

Maximum Directory Cache Buffers = 4000

If the number of allocated directory cache buffers approaches this maximum, try increasing it by 500. If this maximum is never reached, try decreasing it. Continue doing this until the maximum is within 500 above the actual peak achieved.

Maximum Number of Internal Directory Handles = 500

Internal directory handles are used by the cache as well as by other NLMs running on the server.

5.5. File System

Immediate Purge of Deleted Files = on

If the server is only being used for BorderManager, then enable this parameter. If it is being used by other applications, or for file access, disable this parameter.

Enable File Compression = off

If the server is only being used for BorderManager, then disable this parameter. If it is being used by other applications, or for file access, enable this parameter.

5.6. Locks

Maximum File Locks = 100000

This includes file locks used by the cache and other NLMs.

5.7 Disk

Enable Hardware Write Back = on

If your hardware will support it, this will improve write performance.

Enable Disk Read After Write Verify = off

This is typically not needed with today's hardware, and enabling this causes a large performance drain.

5.8. Miscellaneous

Worker Thread Execute In A Row Count = 15

This is the number of times in a row the scheduler will dispatch new work to do before allowing other threads to run.

Pseudo Preemption Count = 200

The number of read or write calls threads are allowed to make before they are forced to relinquish control.

Minimum Service Processes = 500

Each request that the proxy receives will spawn a WorkToDo thread, uses a service process. You want to allocate enough service processes initially to cover the peak. So start with 500. If you see the peak rise above that, try increasing this to 50 above that peak. If you never see this number increase after a peak usage time, try decreasing this by 100. Keep doing this until you are confident that you have it set to within 50 more than the actual peak service processes that are allocated by your server during normal peak time operation.

Maximum Service Processes = 1000

This is the maximum setable on the server. In a web cache environment, where threads are constantly allocated, and so service processes are constantly being used, you don't ever want to run out of service processes. Running out of service processes will cause abend and/or hung server conditions.

New Service Process Wait Time = 0.3 sec

If you notice that your service processes periodically spike, and come close to the maximum during that spike, try increasing this parameter slightly.

5.9. Additional SET Parameters

SET TCP IP MAXIMUM SMALL ECBS = 512 - 65534 (1024 default)

This parameter allows the setting of the small ECB pool. By default, 1024 buffers are preallocated for IP applications to take advantage of. The most common use of these small ECBs ( 256 bytes in size, as opposed the main ECB's that are the size of the max. physical receive packet buffer - default 4224 ) are when doing IP Fragmentation, when sending out an ICMP ECHO request, the Border Manager/ICS DNS resolver code, or tunneling fragmented packets with VPN.

The default should be enough but if the application or system seems to slow down at different times, try allocating more small ECBs using this SET command. At the same time, verify that the MONITOR -> System Resources -> Alloc Memory -> "TCPIP Small ECBs" number in use count is at, or very close to the value it was when the system came up e.g. 143,360 default on ethernet. A higher value than this may indicate that there are not enough preallocated small ECBs to handle the number of requests being made. Increase the above SET command value by a factor of 2 to test out.

NOTE: With current Proxy patches applied you should NOT need to change this
SET parameter to improve Proxy performance or to resolve hang conditions.

SET NCP EXCLUDE IP ADDRESSES=

This parameter will disable NCP services on all specified addresses and prevent this address from being bound and propagated as an SLP service address throughout the tree. If the PUBLIC address of the server is bound to SLP and is not accessible to all servers in the tree the impact can be devastating. Random problems may occur throughout the network including: Servers won't down, slow logins, -625 NDS errors, and time sync problems to name a few. See KB 10014127 for details. The server must be patched to a minimum of Support Pack 5 for NetWare 5.0 and Support Pack 1 for NetWare 5.1 to enable this SET parameter.

SET TCP DELAYED ACK = OFF

SET TCP DELAYED ACK is enabled by default on TCPIP.NLM versions 5.32y and above. The switch was enabled to help with bandwidth issues but will adversely affect PROXY performance when enabled. PROXY.NLM in any BorderManager patch dated 7/16/01 or later will disable the delayed ack for PROXY.NLM only and the SET parameter should not be used. Using the SET parameter is a global setting for the whole server.

---------------------------------------------------------------------------
6. NetWare Administrator (NWADMN32) Settings
---------------------------------------------------------------------------

Change the following BorderManager settings in NWadmn32. Some of these parameters may be unique to different versions of BorderManager.

Please remember, this is a place to start and will require customization on your part to fit it to your requirements and configuration.

6.1. Maximum Hot Unreferenced Time = 30

This may be tuned depending on your installation. Increasing this will keep the object in hot cache longer. I have set this to match my setting for Directory Cache Buffer NonReferenced Delay (see section 5.4 above).

6.2. Cache Hash Table Size = 256

Cache Hash Table Size may improve performance on extremely busy sites with large cache volumes. Try setting to 256k.

6.3. Maximum Number of Hot Nodes = 50000

Maximum Number of Hot Nodes may improve performance on extremely busy sites with plenty of memory by increasing this to 50000.

6.4. Number of Directories = 128

Number of Directories may improve performance by increasing the number of directories. As individual cache directories get a large number of files in them, it takes longer to hash through the table for that directory. It is better to have more directories with fewer files than more files in fewer directories. Try increasing this to 128 directories per cache volume. For example, using two cache volumes the setting should be 256. For three cache volumes the setting would be 384, etc.

6.5. DNS Transport Protocol = UDP

The default setting for the DNS Transport Protocol is UDP. There should be no reason to change this to TCP. The option to use TCP is provided for installations where the DNS server or firewall only allows DNS requests over TCP. Before setting this to TCP you should determine why, and if TCP can at all be avoided. UDP for the DNS Transport will provide far better performance since the Proxy is resolving individual names, and not requesting entire DNS domain downloads at a time.

---------------------------------------------------------------------------
7. Memory Considerations
---------------------------------------------------------------------------

Memory is critical to a Proxy server. If you run low on memory, what might otherwise have been in hot cache (in memory) may instead be found in the disk cache. Having to go to the disk cache is faster than having to go to the Internet, but it is slower than only having to go to memory to fill the request.

First, eliminate any NLMs that you don't need. Typically, you put a caching server in place to provide performance. Move other things off of the proxy server (such as your email server, your web server) and put them on servers of their own. This allows the server to dedicate its processing time and memory to the proxy, and will improve performance for the proxy as well as for the other services you moved.

Watch the LRU Sitting Time in Monitor (in the Cache Utilization Statistics window). After the server has been up for a while, this should be at least 15 minutes. If it falls below 15 minutes, add more memory to the server. Also, watch the Available Cache Buffers in Monitor (in the Server Memory Statistics window). This should be 30% or greater. If it falls below this, add more memory to the server.

The number and sizes of buffers that your server allocates can have an effect on memory. Allocating more memory to directory cache buffers and packet receive buffers than is actually needed is a waste of memory. Each service process will take 16k of memory. Each directory cache buffer will take 4k of memory. Each packet receive buffer will take 1514 bytes of memory (or whatever you specified as the size of the packet receive buffer).

The more name spaces you have loaded on the volume, the more directory cache buffers are required to cache the information for that file. For this reason, you only want to have the DOS name space loaded on the cache volume(s). The filenames generated by the cache hashing algorithms and used to save the cached objects do not require LONG name space.

(If you find any discrepancies with this TID or have any suggestions for improving this TID, please use the feedback feature found at the bottom of the TID on the support.novell.com site. I appreciate any feedback, real-world experiences, or criticism to improve this document.).

.

Additional Information

NEW SERVICE PROCESS WAIT TIME in Novell NetWare 6.0 SP 3 and beyond is no longer an adjustable SET parameter. With new server installs this parameter will no longer show up at all under MONITOR (SP 3 and beyond). Servers that are patched to SP 3 (from a previous support pack) will continue to show the setting under MONITOR as it was set previous to the support pack. HOWEVER, that setting is no longer valid. The new, default setting is in effect--which is DYNAMIC. Again, although the setting shows up under the old parameter, it is really running as DYNAMIC. That cannot be changed. Border Manager will be optimized automatically with this new setting. It does not hurt anything to have that parameter show up as it does under MONITOR.

Formerly known as TID# 10018669