Using Assignment Requests to Analyze General Zone Performance

  • 7004531
  • 17-Apr-2012
  • 01-Aug-2019

Environment

ZENworks Configuration Management 2017
ZENworks Configuration Management 11
ZENworks Configuration Management 10

Situation

How to know back-end performance issues are related to the Primary Server, Database Server, or the LDAP User Source.

Resolution

One of the Key Items to examine on a Primary Server to determine if there is a Load Issue on one of the three Back-end Components (Primary, Database, or LDAP Server) is how long it takes to process Assignment Requests for Users and Devices.  Tracking and analyzing spikes in the time required to process requests can help determine if one of the three primary components is experiencing load issues during different times of the day.  Since each environment is different, it is best to analyze peak times versus off peak times than to state specific timings to judge load issues.  Once load issues are isolated, then steps can be taken to try and address the performance of that specific area, though this specific article will focus on identification of potential areas of concern rather than resolutions for each of the three specific areas.

 

Assignment requests can be broken down into “User Assignment Requests” and “Device Assignment Requests”.  If a User Assignment takes a long time, it indicates a possible load issue with the Primary Server, the Database Server, or the LDAP Server.  If a device assignment request takes a long time, it indicates a possible load issue with the Primary Server or the Database Server, but in General LDAP source issues should not directly impact device assignment requests. 

If the delay is only in User Assignments and not Device assignments, it normally indicates a bottleneck with the LDAP user source.  If there are multiple primary servers and the performance issue tends to occur only on some servers but not others, it tends to indicate a load or other issues on the server, but not the database itself since requests from other servers are handled promptly.  If both user and device requests are slow from both high and low use servers, it will generally indicate a database performance issue.   (Note: User Requests should always tend to have a slightly higher average since resolving user assignments require querying LDAP along with the database versus simply querying the database for device requests.)

 

The primary server's services-messages.log can be used to find out how long assignment requests are taking to be fulfilled.  Search the log for the phrase “getAllAssignments complete, time:” and examine how long the request took to be fulfilled.  If the value for “object:” on that line contains a “~” separating two strings, that the object is a user.  If “object:” does not contain a “~”, then it will be a device object.  Below is an example of a device entry followed by a user entry:

 

[4/14/12 7:59:10 AM] [Assignment Web Service] [getAllAssignments complete, time: 232ms, object: 5b39027b57e2c14d8b31ff2e89c89bea]

[4/14/12 7:59:11 AM] [Assignment Web Service] [getAllAssignments complete, time: 895ms, object: e87125e4f07e7588a1c4d283bafb2594~7006cf90e2a6de118b3c001e4f2ba32a]

 

In general, so long as the assignment request does not exceed two minutes, it should not cause any issues beyond slowness on a client device.  Assignment requests exceeding two minutes may in some cases result in an agent not receiving all assignments due to timeouts on the client side.  However, if a large number of requests take an excessive amount of time to complete, it could results in a backlog and a cascading performance issue.  Expected performance results can vary between implementations depending on the complexity of the ZCM design, hardware, database type, LDAP type and numerous other factors.  Comparing peak and off-peak usage performance results is the best way to determine peak load performance degradation.
 
See TID#7003865 - "Configuring and Optimizing User Source/LDAP configurations" for LDAP performance issues.
See TID#7005560 - "Improving ZCM Database performance by Monitoring and Limiting Table growth " to address database performance issues related to database growth.
See TID#7005606 - "Improving ZCM Database Performance via Configurable ZENworks Options" for additional methods of increasing database performance beyond table growth management.

Additional Information

SMParse.exe , an unsupported utility available on the Novell Forum Blogs will help parse the services-messages.log file:
SMParse.exe will parse the services-messages.log from a server and break down assignment requests by device requests and user requests as well as the general amount of time it took to resolve those request.
 
 
To use SMParse.exe, simply copy the services-messages.log into the same directory as SMParse.exe and launch the program. A file called "summary.txt" will be created as the log file is parsed. Below is a sample " summary.txt" file:

 

Total Device Assignment Requests: 32163

Number Completed in Excess of 1 second: 1234

Number Completed in Excess of 5 seconds: 97

Number Completed in Excess of 10 seconds: 34

Number Completed in Excess of 20 seconds: 4

Number Completed in Excess of 30 seconds: 0

Number Completed in Excess of 60 seconds: 0

 

Total User Assignment Requests: 22833

Number Completed in Excess of 1 second: 2250

Number Completed in Excess of 5 seconds: 288

Number Completed in Excess of 10 seconds: 101

Number Completed in Excess of 20 seconds: 24

Number Completed in Excess of 30 seconds: 7

Number Completed in Excess of 60 seconds: 0

 

Additional logs UserX.txt and DeviceX will also be created for events taking over 1, 5, 10, 20, 30, and 60 seconds respectively. These log entries include the date and time these events took place to assist in trying to determine if the slowdowns are taking place during any particular time of the day. The contents for “ User30.txt” showing all user queries taking over 30 seconds is shown below:

 

[11/21/11 7:02:47 AM] 523cb70e2763b93088b9a1b0c17eda09~9b9853279602a141b723ddefc9716b21, 32432 ms

[11/21/11 7:54:03 AM] 523cb70e2763b93088b9a1b0c17eda09~0a2ddabe4ca47b488a8f81daec818574, 33338 ms

[11/21/11 7:54:12 AM] 523cb70e2763b93088b9a1b0c17eda09~7aca36fa8e2f09409161c3ab346fe8fc, 33974 ms

[11/21/11 7:54:30 AM] 523cb70e2763b93088b9a1b0c17eda09~c36a0cc516978f499373daf7e8f84790, 47351 ms

[11/21/11 7:54:48 AM] 523cb70e2763b93088b9a1b0c17eda09~090deeb543515a4c93d12b288a2bc3bd, 31535 ms

[11/21/11 7:57:58 AM] 523cb70e2763b93088b9a1b0c17eda09~48bd285e55af75448b4985e9d30ede24, 42957 ms

[11/21/11 7:58:13 AM] 523cb70e2763b93088b9a1b0c17eda09~97714f7bd62c43498f0858e93f906b7f, 32512 ms

 

Since the Services-Messages.log in this case covered a 24-hour period, it is clear that the heaviest load is just prior to 8am, but the minimal number of spikes combined with the fact the spikes were not extremely high would indicate that the primary servers, LDAP, and primary servers were all doing an excellent job of handling the load during peak periods. (Note: Consider Configuring the Servers for a Daily-Rollover pattern or combining and breaking down rollover logs to cover midnight to midnight periods to ease analysis.
 
Note: Version 2.0 of the tool has been updated to also identify LDAP Connection Exceptions which indicate an LDAP load issue that needs to be addressed as well as better log handling to detect slight format changes between different versions of ZCM.