Restore LGE for Vertica DB

  • KM03793240
  • 31-Mar-2021
  • 31-Mar-2021

This document has not been formally reviewed for accuracy and is provided "as is" for your convenience.

Summary

Vertica DB on 3 nodes clusted was completely down

Fix

1. Run the last good epoch with the below command:

admintools -t restart_db -d <database name> –p <password> -e last

2. If it still fails, proceed with the following:

a. Start the database in UNSAFE mode:

admintools -t start_db -d <DBNAME> -U

b. Confirm the value of AHM (Ancient History Mark) with the below query:

select get_current_epoch() CE,get_last_good_epoch() LGE,get_ahm_epoch() AHM,(get_current_epoch() - get_last_good_epoch()) CeLGDiff,(get_last_good_epoch() - get_ahm_epoch()) LgeAHmDiff ,get_ahm_time(), get_expected_recovery_epoch();

c. Check which projection holds CPE smaller than AHM:

select distinct 'select do_tm_task('

'''abortrecovery'''

','

''''

projection_schema

'.'

projection_name

''');' as command

from v_catalog.projection_checkpoint_epochs where is_behind_ahm;

or

select distinct projection_schema, projection_name from projection_checkpoint_epochs where checkpoint_epoch < '<AHM>';

d. Disable all those projections from recover. Example:

SELECT do_tm_task('abortrecovery', 'T1_b1');

e. Confirm current LGE again after aborted recovery for problem projections:

select get_current_epoch() CE,get_last_good_epoch() LGE,get_ahm_epoch() AHM,(get_current_epoch() - get_last_good_epoch()) CeLGDiff,(get_last_good_epoch() - get_ahm_epoch()) LgeAHmDiff ,get_ahm_time(), get_expected_recovery_epoch();

f. Shutdown the database:

select shutdown('true');

g. Start the database with admintools in normal state

admintools -t start_db -d <DBNAME>