Orphaned jobs causes the Operator Console to Open and Perform Slowly (NETIQKB41599)

  • 7741599
  • 02-Feb-2007
  • 03-Feb-2011

Environment

NetIQ AppManager 6.x
NetIQ AppManager 7.0.x

Situation

Performance Issues in the Operator Console due to Orphaned Jobs in the Repository

The Operator Console is slow to launch

Resolution

The first item to accomplish is determining whether your QDB 'datarejected' table is filling up with data from orphaned jobs or simply full of old accumulated data.  This is a 2 step process that involves establishing the size of the 'datarejected' table, then emptying the table and monitoring how quickly it repopulates with data.

To determine the size of the QDB 'datarejected' table, perform the following query against the QDB in SQL Query Analyzer:

select count(*) from datarejected
This returns a number.  The larger the number, the larger your 'datarejected' table is growing.  Numbers in the thousands can cause significant slowness when starting the Operator Console, create increased network traffic, and divert Management Server and QDB server resources from processing data for valid jobs.

To determine whether the accumulated data in the 'datarejected' table is new or old:
Warning: This SQL Query directly accesses your database and updates/deletes data. If you perform the query incorrectly it can cause irreparable harm to the database and may result in loss of data. If you are unfamiliar with SQL or how to run a SQL Query please contact NetIQ Technical Support directly.

Clear the 'datartejected' table by running the following query against the QDB in the SQL Query Analyzer:

truncate table datarejected
Wait 5 - 10 minutes for the environment to repopulate the 'datarejected' table with rejected data.
Run the following query against the QDB in the SQL Query Analyzer:

select count(*) from datarejected
If the count is very low (less than 1000) or none at all after 5 to 10 minutes, you do not need to proceed any further, although it is recommended that you clear out your 'datarejected' table on occasion.

However, if your 'datarejected' table has filled up with thousands of new entries, you have one or more orphaned jobs running on one or more AM agent systems that need to be deleted.

The simplest method for deleting orphaned jobs is running the NetIQSync tool (NetIQSync.exe) on your AppManager environment.  This should clean out any orphaned jobs that may be running.  You can find this tool on your AppManager Install CD in the \AppManager\extras\utilities\appmanager_tools directory.

To run the NetIQSync tool:

Copy the file to your AppManager Management Server.
From the command line, run the following command:

netiqsync
The following are the usage instructions for the NetIQSync tool:


Syntax:


netiqsync -s NetIQms -q sql_server:qdb_name:sql_user_name:sql_user_password [-c NetIQmc] [-x] [-n] [-l] [-h]

Options:

-s NetIQms host name or IP addresses.
-q Repository (QDB) database/login information.
-c List one or more NetIQmc host names or IP addresses, separated by ':'. If this option not specified, then this task is performed on all managed computers that have been discovered in the repository (QDB).
-x Stop orphaned jobs.
-n Do not generate events for jobs stopped in the QDB.
-l Place the log and report in the NetIQ log directory. This option only works if you run this utility on the same computer as the MS or MC.
-h The usage instructions.

You do NOT need to include the brackets around the switches.

This should delete a.
ny orphaned jobs running in your AppManager environment.  You can test this by once again purging your QDB's 'datarejected' table, waiting about 5 - 10 minutes, and then checking the count on that table.  A relatively small count is normal and expected.  If the count has already climbed beyond a few hundred and has jumped into the thousands, you may still have orphaned jobs in your environment.  You can either run the NetIQSync tool again, or proceed to the following instructions to locate the individual server on which a job or jobs are running and delete them.

To find out which Agent system a remaining orphaned job or jobs is/are running on and delete it/them, perform the following steps:

Determine if the rejected data in your 'datarejected' table is all being generated by one job on one server, or several jobs on several servers, by running the following query against your QDB in the SQL Query Analyzer:

select JobID from datarejected
You will get  a list of the JobIDs that are generating useless info that is then populating your 'datarejected' table.  For each different JobID, you will need to determine what agent system is running the orphaned job.

On your AppManager Management Server, use Windows Explorer to browse to the ...\NetIQ\temp\Netiq_Debug\<servername> directory, where <servername> is the name of your Management Server.
In this directory, locate and open the RPLIB.log file.
Starting at the end of the file, look for entries that resemble the following example:
07/15/04 16:48:30 (NetIQms-4.6.2000.93 pid 1188, th# 1436) SQLSTATE=37000, native error=50000, msg='[Microsoft][ODBC SQL Server Driver][SQL Server]IncrementEventCount: invalid child eventid 1297385'

The specific information you need is the "invalid child eventid #######."

Using the eventid info retrieved from your RPLIB.log file, run the following query against the QDB in the SQL Query Analyzer:
select * from event where eventid=#######

Note: The Event Table can, in some circumstances, be erased as quickly as it is written to.  This can result in a situation where you run the above query and get no results.  If after you have tried made several attempts and your query continues to yield no results, you will need to re-run the NetIQSync tool to attempt to eliminate all orphaned jobs.

The information returned by this query will include the name of the computer from which this orphaned job data is being sent.

Using the computer name you identified in the previous step, manually delete the orphaned job from the AppManager Agent system:

On the agent system that is running one or more orphaned jobs, go into the Services applet and stop both the NetIQ AppManager Client Communication Manager and the NetIQ AppManager Client Resource Monitor services.
Once both services are stopped, restart them manually using the -oa parameter.  The -oa parameter will clear the agent's local repository.  When the services are restarted, they will query the MS system for job info, the QDB will send out a list of jobs that should be running, and those jobs will be repopulated to the agent system's local repository.
In your Operator Console, you may see unusual errors or events on jobs that were running on the agent system as a result of stopping and restarting the AM services.  Restart any jobs that should be running but did not restart on their own.
Stop and restart any jobs that are giving unusual events or errors.


Close and re-open the Operator Console, and test the performance differences.

Cause

The QDB 'datarejected' table is filling up with rejected data from an orphaned job running on one or more AM Agent systems.

Additional Information

Formerly known as NETIQKB41599