How to Deal with a Corrupt Message Blocking Indexing

  • 7019106
  • 02-Dec-2015
  • 07-Aug-2017


Retain 3.x


I can't search items past a certain date but browse is fine.


If you look in the Indexer log you will find a error along these lines.

04:43:57,283 TextExtractionService - IOException while converting Reader - Input document may be malformed. String will be truncated or incomplete
04:43:57,287 LuceneDocumentUtil - indexing: TEXT.htm, have sizelimit: -1, filesize: 545, handlerclass: com.gwava.extractor.TextExtractor hash: 5E439513536BAD9B56F071C7ED02B5104DE883A7E89821F140F4D78D33D34396
04:43:57,368 LuceneDocumentUtil - indexing: mail.txt, have sizelimit: -1, filesize: 243, handlerclass: com.gwava.extractor.HTMLExtractor hash: 579857260D64406F83ED5DB8309C8B225CE6D26FBAD1437D8955FECAFC013043
04:46:37,218 AbstractBackgroundIndexer - [BGINDEXER] close indexing writer this run...
04:46:37,218 LuceneIndexingLocker - [IMANAGER] indexWriter released from BGINDEXER1448185292111.1448185292112 -- [] -- removed=true
04:46:37,218 LuceneIndexingLocker - [IMANAGER] IndexModificationLock released from BGINDEXER1448185292111
04:46:37,219 AbstractBackgroundIndexer - reportError: IndexThreadProtection :: :: EXCEPTION : java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
    at org.apache.lucene.index.IndexWriter.prepareCommit(
    at org.apache.lucene.index.IndexWriter.commitInternal(
    at org.apache.lucene.index.IndexWriter.commit(
    at org.apache.lucene.index.IndexWriter.commit(
    at com.gwava.indexing.lucene.impl.LuceneBackgroundIndexer.flushIndexWriter(
    at com.gwava.indexing.lucene.impl.LuceneBackgroundIndexer.doThreadWork(
    at com.gwava.indexing.common.IndexingThread$

04:46:37,230 AbstractBackgroundIndexer - [BGINDEXER] Serious error! This catch should never be reached!
04:46:37,230 AbstractBackgroundIndexer - [BGINDEXER] Thread ended.
04:46:37,230 IndexingThread - Index slave has stopped

We went into the database to find the document_id from the hash:
 select document_id from t_document where hash ="5E439513536BAD9B56F071C7ED02B5104DE883A7E89821F140F4D78D33D34396";
Then we found the message_id from the document_id:
 select message_id from t_message_attachments where document_id ="11887702";
Then we checked if the message_id was connected to other messages:
 select count(*) from t_message where message_id = '8777964';
Both hashes from the error were connected to this one message.
And checked if the item was indexed already:
 select f_indexed from t_message where message_id = '8777964';
Which it was not, so we updated the message_id to an invalid number (-3) so we could continue the index operations:
 update t_message set f_indexed = '-3' where message_id = '8777964';

We started the indexer and it began indexing normally.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2668.

Feedback service temporarily unavailable. For content questions or problems, please contact Support.