Node_dependency issue - related to DFS files

  • KM03637496
  • 04-May-2020
  • 04-May-2020

This document has not been formally reviewed for accuracy and is provided "as is" for your convenience.

Summary

troubleshoot and address the node dependency issue related to DFS files on adding nodes and completion of rebalance.

Question

(I) Issue:

On adding node(s) to an existing cluster and completion of rebalance, node_dependencies showing similar to below:

 

get_node_dependencies

Deps:

....

....

0011111111 - cnt: 6

1111111111 - cnt: 86

 

=> select get_node_dependencies('0011111111');

get_node_dependencies

0011111111 - DFS file logWordITokenizer_defaultConfig

0011111111 - DFS file logNgramTokenizer_defaultConfig

0011111111 - DFS file logICUTokenizer_defaultConfig

0011111111 - DFS file 49539595901076520

0011111111 - DFS file 49539595901076532

0011111111 - DFS file 49539595901076546

 

DFS files info not rebalanced to newly added 2 nodes in this case.

 

(II) Troubleshoot:

 

(a) Rebalance of DFS completed but node deps not updated.

Query dc_rebalanced_operations table for object_type 'DFSFile' to find if the DFS object rebalance COMPLETED or not. Below is an example query

 

Select time,node_name,transaction_id,statement_id,object_type,object_name,path_name,table_name,table_schema, operation_name,operation_status,rebalance_method from dc_rebalanced_operations where object_type='DFSFile' order by time desc;

 

If this query shows rebalance of the DFS files completed, it means that DFS files info was copied to newly added nodes, but node deps not updated.

 

In this case recompute node deps addresses the issue

select recompute_node_dependencies();

 

(b) Rebalance of DFS failed.

If the query in Step (a) above retuns 0 rows means rebalance of DFS failed, recompute of node deps will not help in this case. Then grep RebalanceDFSFileTask in one of the new node around the time of rebalance. It will show the txn id.

 

grep RebalanceDFSFileTask vertica.log

 

Using txn id, query dc_errors table to find what error caused the DFS rebalance to fail. In case not enough history in dc_errors table then have to grep vertica.log files, one of the old existing node will be the source node responsible for transferring DFS file info to newly added nodes mostly the initiator of the rebalance task. This gives info on what error caused the DFS rebalance fail.

 

Query vs_dfs_file table to review columns oid, name, parent to find the Parent Package of the DFS files which failed to rebalance and reinstall the package to address the issue.

 

Here is an example

 

=>select * from vs_dfs_file;

        oid        | tag |              name               |   distribution    |      parent       | size | create_epoch | isfile | isdirectory

-------------------+-----+---------------------------------+-------------------+-------------------+------+--------------+--------+-------------

49539595901076466 |   0 | /                               |                 0 | 45035996273704976 |    0 |            0 | f      | f

49539595901076468 |   0 | tokenizersConfigurations        |                 0 | 49539595901076466 |    0 |            0 | f      | t

49539595901076500 |   0 | logWordITokenizer_defaultConfig | 81064793309289126 | 49539595901076468 |  145 |            6 | t      | f

49539595901076510 |   0 | logNgramTokenizer_defaultConfig | 81064793309289126 | 49539595901076468 |   33 |            8 | t      | f

49539595901076518 |   0 | logICUTokenizer_defaultConfig   | 81064793309289126 | 49539595901076468 |   40 |           10 | t      | f

49539595901076530 |   0 | 49539595901076520               | 81064793309289126 | 49539595901076468 |  145 |           11 | t      | f

49539595901076544 |   0 | 49539595901076532               | 81064793309289126 | 49539595901076468 |  133 |           13 | t      | f

49539595901076562 |   0 | 49539595901076546               | 81064793309289126 | 49539595901076468 |  115 |           16 | t      | f

 

All 6 DFS files which failed to rebalance whose Parent Id is "49539595901076468". OID of 49539595901076468 is tokenizersConfigurations.

 

To address the issue:

Reinstall the package tokenizersConfigurations by running

 

cd /opt/vertica/packages/logsearch/ddl

vsql –f uninstall.sql

vsql –f install.sql