Off-Site Reload Server / Slow or Failing Backups Over Slow WAN Links

  • 7019594
  • 25-May-2017
  • 29-Aug-2017

Environment

Reload (All versions)

Situation

Reload server "B" regularly fails when trying to pull GroupWise databases across from Reload server "A", where "B" is the off-site (or secondary) server and "A" is the on-site (primary) server; or, Reload server "B" is pulling a backup across a VPN or slow WAN link and the backups are failing or taking over 24 hours to run.

Resolution

The link speed and line quality between the two Reload servers is the greatest hindrance to getting a successful backup.

With Reload 5 and newer you'll want to switch to a "Linked Paired Collector"  The process for setting this up can be found in the Reload admin guide. The Linked collector will be able to pass the data much more efficiently than the previous versions transfer process.

The very first Reload backup of a post office requires that 100% of the size of the post office is replicated. This is the same for both an On-site or an Off-site Reload server.  After the first Reload backup, Reload pulls across approximately 12% of the size of the post office in order to perform a backup. So for example, if the post office is 100 Gigabytes in size, then the amount of data that needs to be replicated over the network between SiteA and SiteB is 12 Gigabytes of data. This amount of data needs to be replicated each time a backup is performed.

The following are a few things that can be done to help resolve the issue:


A) Upgrade the WAN link:

The speed and quality of the WAN link between the on-site and off-site Reload servers is the greatest contributor to whether or not Reload can backup data across that WAN link. Some customers have needed to upgrade their WAN link in order to allow backups to work. Other customers have been able to configure Reload to utilize less network resources, which is discussed further in the next paragraph. 

However, the lowest common denominator in getting data to replicate across a WAN link is whether DBCOPY can individually replicate the largest files in the GroupWise message store across the WAN. If this cannot happen, then the only solution is to upgrade the WAN link between the on-site and the off-site Reload servers.

B) Tune Reload for Slower WAN Links:

By default, Reload is configured to perform "High Performance Backups" and multiple threads. This model causes Reload to backup post offices quickly.  However in many WAN environments, this default configuration will not work.

Furthermore Reload is configured to use NFS connectivity that is tuned for faster link speeds. The following are the steps and guidance for ratcheting down Reload's high performance in order to work better in WAN environments. If making the settings changes in this section still does not allow backups to complete successfully, then upgrading the WAN link is the best solution.


Steps to Disable High Performance Backups on a Reload Profile:

  1. On the off-site Reload server,  go into Reload administration console.
  2. Edit the profile that represents the off-site Reload profile that is pulling backups from the on-site Reload server.
  3. Select Standard | Advanced | High and set "High Performance Standard Backups" to "Disabled".

Steps to Disable Threading on a Reload Profile:

  1. On the off-site Reload Server in Reload administration console, edit the profile that represents the off-site Reload profile that is pulling backups from the on-site Reload server.
  2. Select Standard|BLOBS|Set the DBCOPY-Threads and the BLOBS-Threads to 1.
  3. Go back a screen, and select Database and set the DBCOPY-Threads to 1.

Steps to Tuning NFS Connectivity on a Reload Profile:

  1. On the off-site Reload Server, launch the Reload administration console and edit the profile that represents the off-site Reload profile.  This would be the Reload server that is pulling backups from the on-site Reload server.
  2. Select Advanced | MTU.

    This "MTU" setting sets the NFS rsize and wsize values, which are its buffers.  It shouldn't be confused with the NIC's or MTU (maximum transmission unit).  Data is stored in buffers before it is sent over the network.This size of the buffer will affect the way the data is transmitted across the network. 

    "If the buffer is too large, the kernel or hardware may spend too much time splitting it into MTU-sized chunks.  If it is too small, there will be overhead involved in sending a very large number of small packets." (https://wiki.archlinux.org/index.php/NFS_Troubleshooting, "Buffer Cache Size and MTU", bottom of page).

    Change the MTU size to 4 bytes below the Maximum Transmission Unit between the two Reload servers. For WAN links this is often 1496. Please consult with your WAN specialist.

    See also, "How to Disable Reload's "MTU" Setting".
     
  3. Also in the Advanced menu, change the "Speed" to "Slow".

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 1894.