Why do my Exchange archive jobs run so slow? My throughput is far less than 3.
In general, we have found that acceptable throughput is in the 3-5 messages per second range. In well designed systems with sufficient hardware resources we have seen throughput above 10 m/s.
There is definitely an issue if the throughput is less than 3, and we have seen instances of less than 0.1.
The first place to look is the worker log.
We are looking for how long it takes Retain to log into each mailbox and when it finds the endpoint which tells us it entered the mailbox.
Search the log for lines containing:
enterMailbox Discovered endpoint
Now you want to compare the difference in times between these two lines. It should be less than 2 seconds. If it is significantly longer than 2 seconds it is most likely an issue with the DNS not properly serving autodiscover.
2015-09-25 12:02:14,177 DEBUG [RTWQuartzScheduler_Archive_Worker-1] com.gwava.ews.archiveimpl.process.ExchangeUser: Discovered endpoint: https://ad.test.sys/ews/exchange.asmx
Another thing to search for are connection failures and retries, which increase each time it fails which can add up to 4 minutes:
search for items
Software caused connection abort: recv failed EWS request failed: null. Will retry after
javax.xml.ws.WebServiceException: java.net.SocketException: Software caused connection abort: recv failed
at com.sun.xml.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(Unknown Source)
Caused by: java.net.SocketException: Software caused connection abort: recv failed
at java.net.SocketInputStream.socketRead0(Native Method)
... 27 more
2015-07-22 00:25:25,056 DEBUG [Thread-1341102] com.gwava.ews.RetainExchangeWebserviceFactory: EWS request failed: null. Will retry after 2 seconds
This will retry a few times with longer delays untl it aborts. Here we are losing connection to the Exchange server while already in a mailbox. This can indicate that there are issues with either a message attachment or the webserver on the Exchange or CAS servers is unable to serve the item at this time. Go to the message in Outlook or OWA and see if it can be accessed.
If the message can be accessed successfully export it as a .pst and use the PST Importer to bring it into Retain.
If the message cannot be accessed successfully then it will have to be deleted.
You may also want to check the health of the Exchange server itself.
The first thing to check is the performance of the server by going into Performance Monitor to see it is above 80% utilization of CPU, Memory, Disk and/or Network. If they are consistantly high you will want to use the various Server health, monitoring, and performance cmdlets to pinpoint the issue
Another thing to check are the Queues. The mail queues are how Exchange handles mail. You can see they by going into Exchange Tookbox/Queue Viewer. The number of messages in the queues should be low, if there is a queue with hundred or thousands of messages and they are not being cleared then that queue may have a stuck message, which would need to be cleared.
You can also use the Exchange Managment Shell (EMS) to check the status of the queues.
Another thing to check are the mailboxes. Performance can degrade if a mailbox has too many messages (~100k). The number of messages is more important then the size of the messages. For large systems you should pipe to a file since this command can exceed the EMS buffer.
Get-Mailbox | Get-MailboxStatistics > c:\mailboxstat.txt
If there is a specific mailbox with issues you may need to repair the mailbox.
You can get a quick overview of an Exchange server's health by running this EMS cmdlet:
Get-ServerHealth -Identity server1 | Sort-Object AlertValue | ft Name, AlertValue