Alerting, Logging and Monitoring Access Governance Suite

  • 7011014
  • 07-Mar-2011
  • 19-Oct-2012

Resolution

Many clients want to know how they can setup alerting and monitoring for Access Governance Suite within their environment. You may also wish to know if NetIQ provides an interface with certain types of log monitoring systems, or provides any standard error messages or other guidance for System Administrators. Until the System Administrator's Guide is available, this content is offered for alerting, logging and monitoring advice.

First, it is important to understand that Access Governance Suite uses a standard java logging method to produce log output. That method is configured and described in the log4j.properties files that should be deployed within your instance at the time of deployment of Access Governance Suite. An example of the log4j.properties file can be found in the NetIQ Support Guide here on œOnline for your review, and should be handy for every System Adminsitrator who will be supporting an instance of Access Governance Suite.

There are some standard levels of logging in place in the standard log4j.properties file that are preset. These are normal logging levels, and are expected not to be modified until advised to do by support, or during periods of development on a development server, where a greater level of detail can help determine the root cause of a development error much faster.

In a production environment, too much logging could be problematic, especially if storage is a premium or if the system might accidentally overload the temp space of the file-system. Production logging should only be modified in conjunction with support advice, and only to the level specified and returned to normal levels when not required for debugging efforts. Remember that changes to the logging levels do require a web application server restart in order to take effect.

When tweaking the log4j.properties file, it is important to remember that there are several possible levels of logging:

1) DEBUG - The DEBUG Level designates fine-grained informational events that are most useful to debug an application
2) INFO - The INFO level designates informational messages that highlight the progress of the application at coarse-grained level.
3) WARN - The WARN level designates potentially harmful situations.
4) ERROR - The ERROR level designates error events that might still allow the application to continue running.
5) FATAL - The FATAL level designates very severe error events that will presumably lead the application to abort.

You should review the NetIQ pre-configured levels that are commented out in the log4j.properties file and note that they are normally commented out as they are turned on or off depending upon if they are required to debug a problem that may be occurring in production, and at what stage that problem is occurring.

Most problems within Access Governance Suite occur as a result of data not conforming to the defined standards agreed upon for the application. In other words, the application uses rule-sets to take data in and then build an identity, and it expects that data to arrive in a certain format. What usually causes issues is the discovery of unclean data, such as null values that someone failed to expect and add checks for in a rule-set, or a logical error in a plan that was unforeseen and only discovered when executed.

Data can come in as "bad" when a new set of data is merged into the application that "should" be the same as the old, but is not. All of these things can lead to the most commonly seen error in the GUI - a "Null Pointer Exception" - which means usually that somewhere in the application a value was passed that was a null value that was expected to actually have a value and there was not a "check for null" built into the operation that was parsing the data.

This is not the only error of course, but it is the most common error reported, a Null Pointer Exception (NPE). Tracking down the actual cause of the Null Pointer may require some tweaking of the log4j.properties file depending upon the operation underway when the NPE occurred. Note that an NPE can occur during an Identity Refresh as well, or during a work flow or other activity - anywhere we are parsing and evaluating data and expect to see a value but find a null value instead.

Support can and will advise you of what to modify within log4j.properties and to what level. They may advise you to add parameters if the proper parameter is not present, and they may request a reset and retest of the operation that went awry to capture the data to pinpoint the errors that occurred.

One thing that you can do to be proactive however, is to export your logs to a system monitoring tool of your choice and review the logs for those FATAL and ERROR messages that do occur in the logs. Noting the time of the logs, if you then gather the logs as well, you can send them into support for review to determine if further work is necessary to resolve any potential issues that may be ongoing in your system.

These messages will often provide you with environmental clues within your network as well, such a JDBC connection failures (network or database issues), authentication failures (perhaps a password reset on a connection?_ and other issues that can be resolved internally.

If you do see such FATAL and ERROR messages though, you may wish to gather those and contact NetIQ Support for assistance when they are in regard to the Access Governance Suite application itself.