Error: Worker idling (server down, transient error, bad http, or in maintain mode)

  • 7019184
  • 10-Feb-2015
  • 07-Aug-2017

Environment


Retain 3.x
Linux

Situation


I'm seeing a lot of archive job errors on different mailboxes.  In looking at the Worker log, I'm seeing these entries:

2015-02-09 18:46:01,860 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: A possibly transient error was reported at the server... might retry
2015-02-09 18:46:01,860 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: Will idle and retry
2015-02-09 18:46:01,860 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: Worker idling (server down, transient error, bad http, or in maintain mode)
2015-02-09 18:46:03,770 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.HTTPWrapper2: using ApacheClient with TimedHttpMethodRetryHandler...
2015-02-09 18:46:03,771 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.HTTPWrapper2: headerstream.length: 1300 dataStream: 317
2015-02-09 18:46:03,783 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: A possibly transient error was reported at the server... might retry
2015-02-09 18:46:03,783 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: Will idle and retry
2015-02-09 18:46:03,783 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: Worker idling (server down, transient error, bad http, or in maintain mode)
2015-02-09 18:46:05,723 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.HTTPWrapper2: using ApacheClient with TimedHttpMethodRetryHandler...
2015-02-09 18:46:05,723 DEBUG [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.HTTPWrapper2: headerstream.length: 1300 dataStream: 317
2015-02-09 18:46:05,734 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: A possibly transient error was reported at the server... might retry
2015-02-09 18:46:05,734 INFO  [RTWQuartzScheduler_Archive_Worker-8] com.gwava.http.RetainServerCommunication: Giving up...too many retries!
2015-02-09 18:46:05,734 TRACE [RTWQuartzScheduler_Archive_Worker-8] com.gwava.caapi.process.ArchiveAttachment: Send a nice healthy blob:Archive: ERROR: Fatal Error Result=AddedEMails: 0, emailID=null, parentID=null
2015-02-09 18:46:05,734 ERROR [RTWQuartzScheduler_Archive_Worker-8] com.gwava.caapi.Archiver:
com.gwava.archive.exceptions.RetainWorkerException: Server-ERROR encountered on item: 5244CBBC.[customer domain].[customer PO].200.[GroupWise message ID]

Resolution


Since it was reporting a "RetainServerCommunication" error (server down, transient error, ...), we looked at the RetainServer log for that exact same timestamp where the transient error was first reported and found this:

2015-02-10 13:28:16,475 [Store_1423603696474_12583] INFO  com.gwava.archive.StoreEmail - #begin archiving: 2B2D491BE72AE9D5AA146D26590A1A75 525418BE.[customer domain].[customer PO].200.20000EC.1.EAD.1 5E0425EFAC8BC9D83280A16231D558B06D67593022624504DB7EECE2BC3B60A6<===== Note from KB author:  This is the file hash.  The first 6 characters constitute the archive directory path for this file on the retain system.
2015-02-10 13:28:16,475 [Store_1423603696474_12583] INFO  com.gwava.archive.StoreEmail - userID: [userid]
2015-02-10 13:28:16,475 [Store_1423603696474_12583] TRACE com.gwava.archive.StoreEmail - file received: "Header" (723 bytes)
2015-02-10 13:28:16,475 [Store_1423603696474_12583] TRACE com.gwava.archive.StoreEmail - file received, ContentLength: 1899 bytes
2015-02-10 13:28:16,475 [Store_1423603696474_12583] TRACE com.gwava.archive.StoreEmail - request.getContentLength() - (metaDataLength+2) = 473
2015-02-10 13:28:16,475 [Store_1423603696474_12583] TRACE com.gwava.archive.StoreEmail - asDoc.compressedSize = 473
2015-02-10 13:28:16,476 [Store_1423603696474_12583] TRACE com.gwava.caching.PartitionCache - Def Partition Path: PARTITION:id=0;name=default;path=/storage/archive;bt=1423603696;et=0
2015-02-10 13:28:16,476 [Store_1423603696474_12583] DEBUG com.gwava.engine.standard.ArchiveFile - PATH FOR ARCHIVE FILE: /storage/archive  <===== Note from KB author:  archive directory path for this customer
2015-02-10 13:28:16,477 [Store_1423603696474_12583] INFO  com.gwava.archive.StoreEmail - This Exception is transient and will be re-tried. Therefore error-count will not be increased
2015-02-10 13:28:16,478 [Store_1423603696474_12583] ERROR com.gwava.utils.ServerErrorHandlerStrategy - reportError: StoreEmail :: com.gwava.archive.StoreEmail.handleArchiveExceptions:604 :: EXCEPTION : com.gwava.datastore.exceptions.DataStoreException: java.io.IOException: Permission denied com.gwava.datastore.exceptions.DataStoreException: java.io.IOException: Permission denied

Based on the "permission denied" error, we went down the directory path based on the hash and stopped at the first-level directory "5E" (/storage/archive/5E).  We did a directory listing and found that the directory owner was "108" rather than "tomcat" on the first 7 subdirectories (00, 01, 02, 03, 04, 05, 06).

The customer had moved retain to another server.  In that process, it was discovered that when Tomcat got installed on the new server, the userid "108" was already used by another process, so tomcat was assigned a different userid (see step #6 in  "Moving Retain to New Server (Linux to Linux" for an explanation of this).  They attempted to resolve this, but in that process, had missed the 5E directory and its subdirectories.  When Retain attempted to store the message attachment to disk, anytime a file hash was set to one of the 5E subdirectories, Retain was getting "access denied" errors from the file system.

We used the chown command to set tomcat as the owner of ../5E/00 - 06 and that resolved the issue (from the 5E directory, we issued this command: chown -R 00 01 02 03 04 05 06).  We ran a new archive job against the mailbox getting the errors and the errors no longer occurred.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2463.