VMWare job data contains gaps (NETIQKB72388)

  • 7772388
  • 08-Jun-2010
  • 26-Jul-2010

Environment

NetIQ AppManager 7.0.x
NetIQ AppManager for VMWare v7.7

Situation

VMWare job data contains gaps
VMWare job data streams are interrupted or contain gaps in data.

Resolution

Actions you can take to reduce or eliminate the occurrences of data gaps in VMWare job data include:

Reducing the Agent's Logging Level

NOTE: This will disable all logging for the Agent.  If there is ever a need to troubleshoot an issue on that Agent in the future, there will be no existing logging of that issue, so this change should only be made with the understanding that troubleshooting an issue in the future will likely require that logging be re-enabled and that issue be reproduced before corrective action will be determinable.

If you determine that you would like to disable logging on Agents that are being used to monitor VMWare metrics, you will need to perform the following steps:

On 32-bit Windows Agents

  1. Launch Regedit
  2. Navigate to the following registry path:

    HKLM | Software | NetIQ | AppManager | 4.0 | NetIQmc | Tracing

  3. Change the value for TraceMC from '1' to '0'.
  4. Change the value for TraceKS from '1' to '0'.
  5. Stop and re-start the NetIQ AppManager Client Resource Monitor Service.

A re-start of the server is NOT necessary.

On 64-bit Windows Agents

  1. Launch Regedit
  2. Navigate to the following registry path:

    HKLM | Software | Wow6432Node | NetIQ | AppManager | 4.0 | NetIQmc | Tracing

  3. Change the value for TraceMC from '1' to '0'.
  4. Change the value for TraceKS from '1' to '0'.
  5. Stop and re-start the NetIQ AppManager Client Resource Monitor Service.

A re-start of the server is NOT necessary.

The following are additional changes that can help reduce the occurrence of gaps in VMWare data.  Typically these changes can be made in conjunction with each other to add additional improvements in the integrity of data collection for VMWare Module jobs:


Eliminate the HostMonitor and VMMonitor scripts from use

The VMWare_HostMonitor and VMWare_VMMointor KSs are no longer supported as of AppManager for VMware version 7.7.  They should not be used, and should be deleted.

The functionality of the VMWare_HostMonitor KS has been replaced by the following individual scripts:

  • HostCPUusage
  • HostMemoryUsage
  • HostDiskIO
  • HostNetworkIO

The functionality of the VMWare_VMMonitor KS has been replaced by the following individual scripts:

  • VmCPUusage
  • VmMemoryUsage
  • VmDiskIO
  • VmNetworkIO

Change the PIOC Data file size to 20Mb, from the default of 5Mb

On the Management Server that is acting as Primary MS for the VirtualCenter Server's Agent, increase the IOC Map File size for Data, Events, and JobStat to 20Mb, from the default of 5Mb.  To do so, on the Management Server, perform the following steps:

  1. Open Regedit on the Management Server.
  2. Navigate to the following registry node:

    HKLM | Software | NetIQ | AppManager | 4.0 | NetIQms | Config

  3. Change the value of the following three keys to 20 (Decimal value, not Hex value):

    PIOC Data Map File Size MB
    PIOC Event Map File Size MB
    PIOC JobStat Map File Size MB

  4. Re-start the NetIQ AppManager Management Service.

A re-start of the server is NOT necessary.


Use a dedicated Management Server (MS)

You can help to reduce load on existing Management Servers which are processing data for all other Agents by providing a separate, dedicated Management Server for the Agent running on the VirtualCenter Server.  Please refer to the AppManager Installation instructions for the steps involved.

Cause

Unlike most AM modules, the VMware module has the potential to collect a very large amount of data with a single job.  For example, in a given iteration on a VMWare environment with 2000 VM?s and 10 memory metrics per VM, there could be as many as 20,000 datapoints generated and sent from the monitoring Agent to the Repository Database (QDB) on every iteration.

By default the interval for most KSes in the VMware module will be 15 minutes or more, and typically this is enough time for the Agent to collect and send all of the appropriate metric data.  However, customers may desire to run the VMWare jobs on a shorter interval.  If the timeframe between iterations is not sufficient for the Agent to collect and deliver all of the collected data, then gaps may occur in that data as iterations may be missed, interrupted, or significantly delayed by previous iterations.

But in addition to shorter iteration intervals, it has been determined that gaps can also occur in VMWare data under a few other conditions:

  • If basic Agent logging is enabled, even if those jobs are run at their default intervals.  Specifically, the TraceMC and TraceKS values in the registry, which by default are both set to "basic" logging, can cause enough delay for jobs that collect a very large amount of data, to result in gaps in VMWare job data.  It is important to note that this delay has only been observed for jobs collecting a very large amount of data per iteration.  Typical AppManager jobs would not be significantly impacted by this delay.

  • The VMWare_HostMonitor and VMWare_VMMonitor KSs combined the data collection of mutiple other KSs into a single KS.  Unfortunately this has led to very large amounts of data being generated and sent through the Agent to the MS, every time one of these jobs iterates.  It is as a result of this unintended limitaiton that both KSs were discontinued in the VMWare for Windows v7.7 Module.

  • If the PIOC Map Files on Management Servers are used as a temporary buffer for the Management Service, in the event that Agents send more data than the MS can fit into the QDB at that moment in time.  The Management Server that is handling data that is coming from the Agent on the VirtualCenter server can frequently find itself having to buffer data temporarily when the VMWare jobs iterate.  If those PIOC Map Files are running at their default size (5Mb), there can be issues with data gaps if the Management Servers fall behind on delivering the large amounts of VMWare data.

  • If the Agent on the VirtualCenter Server is sharing a Management Server with other Agents in the AppManager environment.

Additional Information

Formerly known as NETIQKB72388