Corrupted translog in Elasticsearch

  • 7025165
  • 21-Jun-2021
  • 23-Jun-2021

Environment

Advanced Authentication 6.x
AAF

Situation

In some cases, after upgrade of Advanced Authentication server you may face one of the issues:
a) An error is shown on the Dashboard of the AA Administrative Portal:
TransportError(503, 'Search Guard not initialized (SG11). See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md') (Unknown Error).

b) The aucore container cannot start due to "Elasticsearch is not ready". The logs of the aucore container (docker logs aaf_aucore_1) contain the warning:
WARNI [aucore.scripts.wait_elastic] ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='127.0.0.1', port=9200): Read timed out. (read timeout=1))
The logs of the searchd container (docker logs aaf_searchd_1) contain the errors:

      Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/usr/share/elasticsearch/data/nodes/0/indices/Pw4b587bTiCz4IJ1QpUQ-Q/4/translog/translog-31694.ckp] is corrupted

      Caused by: org.apache.lucene.index.CorruptIndexException: misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=SimpleFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/Pw4b587bTiCz4IJ1QpUQ-Q/4/translog/translog-31694.ckp"))

Resolution

Enter the searchd container:
docker exec -it aaf_searchd_1 sh

Remove the corrupted ckp file mentioned in the error:
sh-4.4# rm /usr/share/elasticsearch/data/nodes/0/indices/<somevalue>/0/translog/<somefile>.ckp

Run the command:
sh-4.4# curl -XPOST -k -u "elastic:$(awk -F "=" '/es_password/ {print $2}' /opt/AuCore/data/es_data.json)" "https://localhost:9200/_cluster/reroute?retry_failed=true"

Exit from the container:
sh-4.4# exit

Restart the server: 
shutdown -r now

Cause

The lucene index was corrupted after non-graceful shutdown or OOM kill.

Additional Information