Vertica fails to start - Startup Failed, ASR Required

  • 7024790
  • 24-Aug-2020
  • 01-Feb-2021

Environment

ZENworks Configuration Management 2020
Vertica

Situation

Whilst starting Vertica the server console shows:

host (10.10.10.10) report: Startup Failed, ASR Required
Database zenworks did not start successfully: Something Failed



Using the 'force' option (-F, --force) does not resolve the issue:

/opt/vertica/bin/admintools -t start_db -d zenworks -F
   Starting nodes:
      v_zenworks_node0001 (10.10.10.10)
   Starting Vertica on all nodes. Please wait, database with a large catalog may take a while to initialize.
      Node Status: v_zenworks_node0001: (DOWN)
      ...
Found these errors in the startup.logs on hosts:
host (10.10.10.10) report: Startup Failed, ASR Required
Database zenworks did not start successfully: Something Failed



Error in: /var/opt/novell/log/zenworks/startup.log
"node" : "v_zenworks_node0001",
"stage" : "Startup Failed, ASR Required",
"text" : "Node Dependencies:\n1 - cnt: 95\n\n1 - name: v_zenworks_node0001\nNodes certainly in the cluster:\n\tNode 0(v_zenworks_node0001), epoch 408254\nFilling more nodes to satisfy node dependencies:\nData dependencies fulfilled, remaining nodes LGEs don't matter:\n--",
"timestamp" : "2020-08-20 10:40:28.214"



Messages in: /var/opt/novell/log/zenworks/vertica.log
2020-08-20 10:38:37.006 Spread Service InOrder Queue:0x7f2846599700 [VMPI] <INFO> Cluster is being formed; we are invited
...
2020-08-20 10:38:37.481 Init Session:0x7f2844d96700-a00000008d8501 [Txn] <INFO> Begin Txn: a00000008d8501 'runLoadBalancePolicy'
2020-08-20 10:38:37.481 Init Session:0x7f2845d98700 [Session] <INFO> Load balance request from client address 10.10.10.10 had decision: Classic load balancing considered, but either the policy was NONE or no target was available. Details: [NONE or invalid]
2020-08-20 10:38:37.481 Init Session:0x7f2845d98700 <LOG> @v_zenworks_node0001: 00000/5789: Connection load balance request refused by server
2020-08-20 10:38:37.481 Init Session:0x7f2840d92700 <LOG> @v_zenworks_node0001: 00000/2705: Connection received: host=10.10.10.10 port=56678 (connCnt 8)
2020-08-20 10:38:37.481 Init Session:0x7f2840d92700 <LOG> @v_zenworks_node0001: 00000/5998: Received connection load balance request
2020-08-20 10:38:37.482 Init Session:0x7f2845597700-a00000008d8502 [Txn] <INFO> Begin Txn: a00000008d8502 'runLoadBalancePolicy'
2020-08-20 10:38:37.845 Init Session:0x7f2840d92700 <FATAL> @v_zenworks_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
 LOCATION:  initSession, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_1-x_grader/build/vertica/Session/ClientSession.cpp:556

Resolution

There may have been an abnormal shutdown which resulted in the LGE (Last Good Epoch) not being written.

To confirm the node(s) in the cluster:
vsql -c "SELECT node_name FROM nodes;"

To check the epoch:
vsql -c "SELECT current_epoch, ahm_epoch, last_good_epoch, refresh_epoch FROM system;"

Example output from checking epoch:
current_epoch | ahm_epoch | last_good_epoch | refresh_epoch
---------------+-----------+-----------------+---------------
           148 |       146 |             146 |            -1


Another way to check the epoch:
/opt/vertica/bin/admintools -t return_epoch -d zenworks

Resolve the issue by loading the last good epoch:
/opt/vertica/bin/admintools -t restart_db -d zenworks -p  <vertica_passwd_zman_srvgc> -e last 

Additional Information