Troubleshooting Exchange Performance

  • 7019122
  • 28-Oct-2015
  • 07-Aug-2017

Environment


Retain 3.x/4.x
Exchange Module

Situation


Why do my Exchange archive jobs run so slow? My throughput is far less than 3.

Resolution


In general, we have found that acceptable throughput is in the 3-5 messages per second range. In well designed systems with sufficient hardware resources we have seen throughput above 10 m/s.
There is definitely an issue if the throughput is less than 3, and we have seen instances of less than 0.1.

The first place to look is the worker log.

Mailbox Delays

We are looking for how long it takes Retain to log into each mailbox and when it finds the endpoint which tells us it entered the mailbox.

Search the log for lines containing:

enterMailbox
Discovered endpoint

Now you want to compare the difference in times between these two lines. It should be less than 2 seconds. If it is significantly longer than 2 seconds it is most likely an issue with the DNS not properly serving autodiscover.

2015-09-25 12:00:07,256 TRACE [RTWQuartzScheduler_Archive_Worker-1] com.gwava.caapi.MailboxArchivingStats: enterMailbox: JDoe@RETAIN.GWAVAUTAH
2015-09-25 12:02:14,177 DEBUG [RTWQuartzScheduler_Archive_Worker-1] com.gwava.ews.archiveimpl.process.ExchangeUser: Discovered endpoint: https://ad.test.sys/ews/exchange.asmx

This indicates that there is an issue with how autodiscover is configured in the DNS. It may need an SCP or SRV record.

Message Delays

Another thing to search for are connection failures and retries, which increase each time it fails which can add up to 4 minutes:
search for items

Software caused connection abort: recv failed
EWS request failed: null. Will retry after

2015-07-22 00:25:25,056 TRACE [Thread-1341102] com.gwava.ews.RetainExchangeWebserviceFactory: retry, exception :
javax.xml.ws.WebServiceException: java.net.SocketException: Software caused connection abort: recv failed
    at com.sun.xml.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(Unknown Source)
...
    at com.gwava.ews.archiveimpl.process.CursorFetchThread.run(CursorFetchThread.java:1334)
Caused by: java.net.SocketException: Software caused connection abort: recv failed
    at java.net.SocketInputStream.socketRead0(Native Method)
...
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
    ... 27 more
2015-07-22 00:25:25,056 DEBUG [Thread-1341102] com.gwava.ews.RetainExchangeWebserviceFactory: EWS request failed: null. Will retry after 2 seconds

This will retry a few times with longer delays untl it aborts. Here we are losing connection to the Exchange server while already in a mailbox. This can indicate that there are issues with either a message attachment or the webserver on the Exchange or CAS servers is unable to serve the item at this time. Go to the message in Outlook or OWA and see if it can be accessed.

If the message can be accessed successfully export it as a .pst and use the PST Importer to bring it into Retain.
If the message cannot be accessed successfully then it will have to be deleted.

Exchange Health

You may also want to check the health of the Exchange server itself.

Performance Monitor

The first thing to check is the performance of the server by going into Performance Monitor to see it is above 80% utilization of CPU, Memory, Disk and/or Network. If they are consistantly high you will want to use the various Server health, monitoring, and performance cmdlets to pinpoint the issue

Queues

Another thing to check are the Queues. The mail queues are how Exchange handles mail. You can see they by going into Exchange Tookbox/Queue Viewer. The number of messages in the queues should be low, if there is a queue with hundred or thousands of messages and they are not being cleared then that queue may have a stuck message, which would need to be cleared.

You can also use the Exchange Managment Shell (EMS) to check the status of the queues.

Get-Queues

Mailboxes

Another thing to check are the mailboxes. Performance can degrade if a mailbox has too many messages (~100k). The number of messages is more important then the size of the messages. For large systems you should pipe to a file since this command can exceed the EMS buffer.

Get-Mailbox | Get-MailboxStatistics > c:\mailboxstat.txt

If there is a specific mailbox with issues you may need to repair the mailbox.

Server Health

You can get a quick overview of an Exchange server's health by running this EMS cmdlet:

Get-ServerHealth -Identity server1 | Sort-Object AlertValue | ft Name, AlertValue


See Also:

Autodiscover: How Retain Connects to Your Exchange Mailboxes
Slow Exchange Jobs because of long enterMailbox waits
Creating a DNS SRV record for Exchange

 

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2648.