Troubleshooting Server Workflow Errors in the Application Log as Related to Performance in DRA

  • 7714569
  • 09-May-2013
  • 13-May-2013

Environment

NetIQ Directory & Resource Administrator 8.x

Situation

Troubleshooting Server Workflow Errors in the Application Log as Related to Performance in Directory and Resource Administrator.

The following event may appear in the application log at intermittent times and after different types of DRA operations:

Type: Information
Category: ServerWorkflow
Event: 13102
Description: The Operation : <operation> submit took more than two minutes to complete.

Resolution

General guidelines to correct the situations that could cause this error are as follows:

  1. Network issues are generally due to an inefficient configuration of the network or poor overall network performance.  Additional network diagnostics should be performed to determine the exact cause of the issue.  NetIQ's AppManager ResponseTime for Networks monitors a networks ability to move critical application data. 
  2. To alleviate congestion on a DRA server, these measures are recommended to address the issue:
    1. Deploy secondary DRA servers and configure Assistant Admins to connect to these secondaries as appropriate to distribute the load.
    2. Review the configuration of your Accounts Cache Refresh schedule.  To do so, launch the DRA MMC interface, expand the Configuration snap-in node and select Managed and trusted domains.  The cache refresh schedule is accessed by highlighting the managed domains and clicking properties.  Reduce the frequency of incremental cache refreshes, if possible, and ensure that full cache refreshed are not running more than necessary, especially during normal working hours.  If the account information in a trusted domain is not required for Assistant Admins' day-to-day tasks in DRA, disable all accounts cache refresh operations for those domains.
  3. Query managed domains using the CLI commands or scripts through the DRA ADSI provider during hours of lower network demand.  These list enumerations can be responsible for poor performance in the following cases:
    1. Queries to retrieve non-cached properties from the Active Directory or SAM take longer than those gathering cached properties.  For instance, samAccountName is a cached property and will enumerate more quickly than the homeDirectory property, which is not cached.  The documentation provided with the DRA Software Development Kit contains more information regarding cached and non-cached properties.  An excellent resource is the schema.mdb Access database file provided with the SDK.
    2. Queries to retrieve large lists of objects from the managed domains, such as a request for all accounts ...with name matching *... from multiple managed domains, can take an extended period of time.
    3. When using a script that calls the ADSI provider, the first call for an object type using the Get method implicitly executes the GetInfo method, which reads all properties for all objects matching the call criteria. Because an Active Directory object has many properties, you should not retrieve all available properties unnecessarily as this will reduce the DRA server's ability to process other requests in a timely manner.  Consider using the GetInfoEx method instead, which allows you to retrieve specific properties from Active Directory by limiting the scope of subsequent Get operations.  For more information on the GetInfoEx method of retrieving properties, please refer to the online help section pertaining to the DRA SDK.

Cause

There are multiple causes for this error, some of which are outlined below, along with possible troubleshooting steps and solutions.

  1. One common cause of these errors is network availability.  Low bandwidth or unstable connectivity can the ability of DRA to communicate with other computers, including domain controllers.  A basic test to help troubleshoot network availability issues is Microsoft's Ping command.
    1. Use the Ping command to verify the response time between the DRA server and several domain controllers.  If the response time is high (greater than 100ms) or if Ping statistics reports high packet loss, the cause of the error is likely network related.
  2. Another possible cause could be congestion on the DRA server. When this occurs, the McsAdminSvc process consistently utilizes a high percentage of CPU time and of memory resources.  To confirm this as the cause, please refer to the following troubleshooting steps:
    1. Utilize Performance Logs and Alerts to monitor the Processor, Thread Count and Memory performance objects used by the McsAdminSvc process.  Task Manager can be used for rough monitoring of resource utilization, as well.
    2. The LockDiagnostic utility, in the C:\Program Files\Netiq\DRA directory by default, displays statistics pertaining to the read and write locks currently in progress with and awaiting processing by the McsAdminSvc process.  Clicking the Get Lock Diagnostics button will displays the number of clients with read and write requests open to the specified DRA server.  If more than 20 readers or writers waiting in the RW Lock Diagnostics... section, this is an indication that the DRA server is congested and is unable to process all of the requests in a timely manner.  Finally, Clicking the Get APJS Diagnostics button will display the status of scheduled jobs initiated by DRA, such as the cache refreshes, last logon statistic collection, and resource connections.
  3. Customizations, such as Custom views, scripts, pre- and post-task triggers and DRA policies, can contribute to poor DRA server performance, as well.  In DRA version 6.60, an Application log event (Event ID: 18802) is generated indicate when a customization is causing a given operation to take longer than 20 seconds to complete.  To configure this time to a value other than 20 seconds, the following registry key can be modified accordingly: HKLM\Mission Critical Software\OnePoint\Administration\Data\Modules\Policy\AppLogWarningTimer
  4. Customizations to the DRA MMC or Web Console involving enumeration of lists containing non-cached properties will cause performance problems.  When this occurs, the Administration server component must read any non-cached property from the domain controller for every account in the list.