Where Data Is Stored In Retain

  • 7020625
  • 17-Dec-2014
  • 07-Aug-2017

Environment


Retain 3.x

Situation

Where is my data stored in Retain?  This helps me know where to troubleshoot issues.

Resolution


There are three key data strorage areas in Retain.

  • The Retain database
  • The Lucene indexes
  • The archive directory

These each are critical to back up, and most especially the database and the archive files on disk.  If either of those are gone, you've lost the data altogether.  You can have files on disk, but if the database does not know what files belong to which messages, then they are as good as lost.  There is no way to rebuild the information into the database from files on disk; and, if the files on disk are missing, then all you will have is metadata about the message and their attachments but you will not have their content.  See KB, "Backing Up Retain".  On the other hand, if the indexes are missing, that is still a big problem but those can be rebuilt.  However, while they are being rebuilt, your search results will not be complete until all items have been indexed.  See KB, "How to Rebuild Indexes".

Retain Database

The Retain database is stored in your database management system (DBMS) of choice.  Retain supports MySQL, MS SQL, Oracle, and Postgres.  Most Retain customers use MySQL because it is simple and it is open source with no cost.  However, customers that are using MS SQL or Oracle for other database applications will create their Retain database within their existing database system.  All work just as well and Retain is pretty much database agnostic.

All message metadata (data about the data) is stored in the Retain database as well as most of the configuration data input into the RetainServer web interface by the admin user (jobs, workers, profiles, schedules, etc).  Archive job data and other data used by the Reporting & Monitoring Server is stored in the database as well.  It is pretty safe to assume that almost everything other than the actual message content and message attachments are stored in the database.

When the user is in his/her Retain mailbox and has the "Browse" tab selected, the user is looking at data stored in the database.  If there is missing or inaccurate information on the Browse tab screen, then troubleshooting the database would be appropriate.  The likelihood of this is extremely rare when a message has successfully been archived without errors.

Lucene Indexes

All messages, their metadata, and their attachments are indexed, which means that the Lucene indexer that comes with Retain attempts to index every word in a message, the message metadata (sender, recipients, etc), and in any attachments that are in supported file formats.  Those indexes are stored in the Retain storage area configured in the RetainServer web interface under Server Configuration | Storage. 

Figure 2 - Storage Locations

When a user is in his/her Retain mailbox, has the "Search" tab selected, and is performing a search for messages, Retain is getting that result list from the Lucene indexes.  If items are missing from the result set, then it would be the indexing that you would troubleshoot.

Archive Directory

All message content and attachments to messages are stored on disk in the Retain storage area in a directory off of the "archive" directory (see the figure under "Lucene Indexes" to find the archive directory location).  Every message and attachment gets assigned a "hash".  Because the byte count of every message and file will be unique, its hash value will be unique.  This is how Retain Server determines whether a message and/or attachment has already been processed and stored on disk when an archive job runs.  That file's hash value is stored in the Retain database in the t_document and t_attachment tables.

The archive directory uses a load balancing strategy on disk.  Thus, off the archive directory you'll find 256 two-digit subdirectories:  00 through FF.  Each of those directories have their own set of 256 directories utilizing the same naming sequence (00 through FF).  Additionally, those directories also have their own set of 256 subdirectories.  Thus, if my filename were B4F05EECB7B21D9014A86C32291C913D190C33394365AC79ED3E1F6849532, I would find it under .../archive/B4/F0/5E.

So, when a user clicks on a message link in the Retain mailbox - whether from the Browse tab or the Search tab's search result list - Retain finds the file on disk and places the contents in the message window.  If the original message was known to have text and the message window comes up blank, the file is missing from the location that Retain thinks it is in.  This is extremely rare and usually only happens as a result of moving the archive directory to a new location.  In such cases, we find that either the files did not all copy over properly from the old location or the administrator forgot to tell Retain where the new location is at.

See KB, "How to Find An Archived Message's Corresponding File on Disk" for more information on locating message files on disk that correspond to specific messages in the mailbox.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2425.