Disk Space Not Decreasing Much After Running a Large-Scope Deletion Job

  • 7020698
  • 27-Feb-2015
  • 07-Aug-2017

Environment

Retain 3.x

Situation

I'm running a deletion job where the scope covers several years.  I am expecting it to delete a lot of messages and free up needed disk space.  However, after it runs, not much disk space was recovered.  Is Retain deleting the files?

Resolution


It is important to understand how Retain stores data.  It is a highly efficient single-instance storage solution, meaning that a message could be sent to multiple recipients across multiple mail servers but Retain will only store one instance of that message.  The message content, itself, is stored on disk in the archive directory (see "Where Data Is Stored In Retain").

Retain stores the metadata about the message in the Retain database.  Every Retain user involved with that message will get a pointer in the database pointing to that same file on disk.

Why do I need to know that?

If a deletion job is run on a single mailbox, that file on disk may not get deleted even though the message is removed from the user's Retain mailbox.  Other users have that message in their mailboxes, so if Retain were to delete that file (which contains the message content), then when other users associated with it try to open it, they would get blank content.

What are other factors that could affect whether a file gets deleted? 
  • One or more of those users decides to forward that message or its attachment, and they forward it at a later date that lands outside the scope of the deletion job, then the file also has to remain.
  • One or more of those users are on Litigation Hold (see the Litigation Hold tab in Deletion Management).
  • One or more of those users forwarded that message or its attachment to a user on Litigation Hold.
Thus, you'll need to track down which messages are associated with a file on disk and then compare there dates to see if they land within the scope (because if only ONE does not, the file does not get deleted) and ensure that the user associated with that message was not on Litigation Hold.
  1. Open the RetainServer log.
  2. Perform a search on the string, hash=

That search should bring you to an area of the log where you see lines that read like this:

2015-02-27 12:02:28,564 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=2513, hash=19EB293A13C11FFA29C70E852A2C8B38FAB7FE6807E8B11948DC5E529530E897
2015-02-27 12:02:28,570 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=2514, hash=AF18D2B5739217F2A25AABD5A9943078BC24887DC6232071EABD31B7DCF14223
2015-02-27 12:02:28,576 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=2515, hash=B4712D3696DED8D03EFEF4C2182EA31258F019682DC63E20DB2E7C81AFB7635D
2015-02-27 12:02:28,582 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=2516, hash=1BF4603450817A4861582DE82DC939101A4650479503CB99CC4534F36BE65748

  1. As stated in "Where Data Is Stored In Retain", the first 6 characters of the hash indicate the directory in which the file is stored on disk in the archive directory.  Change to that directory and try to find that file.  If it does not exist, then it obviously has been deleted.  Choose another hash and do the same thing until you find a file that still exists on disk.
  2. Once you have a file that was not deleted, use its hash in the following SQL query (Oracle and/or MS SQL systems might require a slight tweak, but your DBA will know what to do):

SELECT * FROM [retain db name].t_message WHERE message_id IN   (SELECT message_id FROM t_message_attachments WHERE document_id IN (SELECT document_id FROM t_document WHERE hash = '[hash]')) \G

  1. Look at either the date in the f_created, or f_delivered, or f_stored fields, depending on your setting in Deletion Management for "Delete messages where".  That number represents the number of seconds that have transpired since January 1, 1970.  It is known as "Unix Time" .  Convert that time to a date and time in human readable form.  Be sure to indicate your timezone because that can affect the date/time it returns by several hours, which might put it into a different day.
  2. Does that date/time of the message land outside the scope of your deletion job?
  3. If not, then you need to track down the users belonging to those messages to determine if they are on Litigation Hold:

  SELECT * FROM t_abook WHERE f_uid IN
    (SELECT f_uuid FROM t_uuid_mapping WHERE uid_mapping_id IN
        (SELECT uuid_mapping_id FROM t_message WHERE message_id IN
            (SELECT message_id FROM t_message_attachments WHERE document_id IN
                (SELECT document_id FROM t_document WHERE hash = '[hash]')))) \G

This will list out the address book entry for each user associated with that message/attachment file on disk.


Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2485.