This document has not been formally reviewed for accuracy and is provided "as is" for your convenience.
Summary
Question
(I) Issue:
On adding node(s) to an existing cluster and completion of rebalance, node_dependencies showing similar to below:
get_node_dependencies
Deps:
....
....
0011111111 - cnt: 6
1111111111 - cnt: 86
=> select get_node_dependencies('0011111111');
get_node_dependencies
0011111111 - DFS file logWordITokenizer_defaultConfig
0011111111 - DFS file logNgramTokenizer_defaultConfig
0011111111 - DFS file logICUTokenizer_defaultConfig
0011111111 - DFS file 49539595901076520
0011111111 - DFS file 49539595901076532
0011111111 - DFS file 49539595901076546
DFS files info not rebalanced to newly added 2 nodes in this case.
(II) Troubleshoot:
(a) Rebalance of DFS completed but node deps not updated.
Query dc_rebalanced_operations table for object_type 'DFSFile' to find if the DFS object rebalance COMPLETED or not. Below is an example query
Select time,node_name,transaction_id,statement_id,object_type,object_name,path_name,table_name,table_schema, operation_name,operation_status,rebalance_method from dc_rebalanced_operations where object_type='DFSFile' order by time desc;
If this query shows rebalance of the DFS files completed, it means that DFS files info was copied to newly added nodes, but node deps not updated.
In this case recompute node deps addresses the issue
select recompute_node_dependencies();
(b) Rebalance of DFS failed.
If the query in Step (a) above retuns 0 rows means rebalance of DFS failed, recompute of node deps will not help in this case. Then grep RebalanceDFSFileTask in one of the new node around the time of rebalance. It will show the txn id.
grep RebalanceDFSFileTask vertica.log
Using txn id, query dc_errors table to find what error caused the DFS rebalance to fail. In case not enough history in dc_errors table then have to grep vertica.log files, one of the old existing node will be the source node responsible for transferring DFS file info to newly added nodes mostly the initiator of the rebalance task. This gives info on what error caused the DFS rebalance fail.
Query vs_dfs_file table to review columns oid, name, parent to find the Parent Package of the DFS files which failed to rebalance and reinstall the package to address the issue.
Here is an example
=>select * from vs_dfs_file;
oid | tag | name | distribution | parent | size | create_epoch | isfile | isdirectory
-------------------+-----+---------------------------------+-------------------+-------------------+------+--------------+--------+-------------
49539595901076466 | 0 | / | 0 | 45035996273704976 | 0 | 0 | f | f
49539595901076468 | 0 | tokenizersConfigurations | 0 | 49539595901076466 | 0 | 0 | f | t
49539595901076500 | 0 | logWordITokenizer_defaultConfig | 81064793309289126 | 49539595901076468 | 145 | 6 | t | f
49539595901076510 | 0 | logNgramTokenizer_defaultConfig | 81064793309289126 | 49539595901076468 | 33 | 8 | t | f
49539595901076518 | 0 | logICUTokenizer_defaultConfig | 81064793309289126 | 49539595901076468 | 40 | 10 | t | f
49539595901076530 | 0 | 49539595901076520 | 81064793309289126 | 49539595901076468 | 145 | 11 | t | f
49539595901076544 | 0 | 49539595901076532 | 81064793309289126 | 49539595901076468 | 133 | 13 | t | f
49539595901076562 | 0 | 49539595901076546 | 81064793309289126 | 49539595901076468 | 115 | 16 | t | f
All 6 DFS files which failed to rebalance whose Parent Id is "49539595901076468". OID of 49539595901076468 is tokenizersConfigurations.
To address the issue:
Reinstall the package tokenizersConfigurations by running
cd /opt/vertica/packages/logsearch/ddl
vsql –f uninstall.sql
vsql –f install.sql