Configuring initial O365/Exchange job to process one month at a time for each mailbox (automated)

  • 7019123
  • 12-Oct-2015
  • 01-Sep-2017

Environment

Retain 4.x
Exchange/O365

Situation

I have a new Retain system and I want to perform an initial archive job of all messages in the O365 system up to the current date; however, from time to time, O365 errors out and drops the connection (SocketClosedException, AccessDenied, SOAPFaultException, etc).  Is there a way around this?  I cannot make any progress getting an entire mailbox processed.

Resolution

Resolution:

First, there are some things to understand about Retain:

  • Retain records the date/time (in Unix time) of the last item successfully archived from the mailbox in the Retain database's t_abook table (ts_item field).  This is referred to in the web administration UI as Item Store Flag.
  • Due to how the Exchange API works, Retain has to request items folder by folder.
  • If Retain cannot successfully process all of the folders (the job aborts the mailbox before it sees the contents of some of the folders), it does not set the item store flag.  It leaves it at the value it started with; thus, if this is a first-time archive job on your mailboxes and O365 experiences connection problems that prevent Retain from finishing a mailbox, a flag never gets set.  This means Retain will continue to process the same items it processed before and leaves it vulnerable to O365 connection issues before it can finish a mailbox.
  • O365 connection issues have been increasing lately

Retain 4 gets around this O365 problem by providing a setting that allows you to instruct it to process the mailbox in smaller chunks.  When enabled, this option will process one year at a time until the year 2010, at which point it will begin processing one month at a time.  The way it works is it enters mailbox 1, processes the time range - and if successful (which it usually is for these smaller intervals), it advances the item store flag as described previously.  It then processes mailbox 2 the same way and continues through all of the mailboxes until it has completed them all for that time range.  It then cycles back to mailbox 1 and does the next time interval (again, after 2010, it would be 1 month).  It cycles through all of the mailboxes repeatedly until it has successfully archived all of their items.

How it works is, up until 2010, it will process one year at a time; then, in 2010, it begins to do one month at time.  It processes a mailbox for that time frame, advances the flag, logs out, moves on to the next mailbox, does the same thing, and continues through all the mailboxes.  Once it has processed the last mailbox, it cycles back to the first mailbox and takes the next time frame and does this for all the mailboxes again.  This cycle continues until all of the email has been archived.

This is especially handy because O365 can be subject to connection failures.  If Retain loses the connection while processing a mailbox and cannot get connected back to O365 in a few minutes, then it aborts that mailbox and moves on.  Since it could not fully process the mailbox, it cannot advance the flag; thus, it can become a continuous cycle of failures to process a mailbox all the way through.

This new functionality of being able to take one month at a time allows it to complete a mailbox for a month of data and advance the flag; that way, Retain inches along and is able to progress until all data has been archived.

Enabling the Setting

  1. Stop tomcat.
  2. Edit a file called exchange.properties found in .../RetainWorkerN/WEB-INF/classes/config (where N is the Worker number if multiple Workers have been installed).  NOTE:  This must be done for every Worker (if multiple Workers installed) running jobs that you want it to affect.
  3. Change the following two settings to "1" to enable the feature:

exchange.dateFlag.advanceProgressive=1
exchange.o365.dateslice=1

  1. Start tomcat.
  2. Edit the profile associated with the job(s) that you want to run this way.  Under the Scope tab, set the Duplicate Check setting to Ignore all messages older than item store flag (fast).
  3. Start your archive job.

If you log into the Worker web UI, it will show the date range it is currently working on for a given mailbox next to "Current mailbox".

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2629.