Should I re-index the Filr index server?

  • 7024040
  • 29-Jul-2019
  • 03-Apr-2020

Environment

Filr 4.0

Situation

The Filr administrator needs to know if they should re-index the Filr index server.

The Filr administrator needs to understand how Filr works with the database and index servers.

Resolution

Re-indexing may be needed in certain cases, but is not recommended as a first step for every Filr problem.

Additional Information

Filr maintains data in two places: the database server and the index server. The primary persistent copy is in the database and the secondary persistent copy is in the index. When a new entity (user/group/file/folder) is created in Filr, metadata information of the entity is first created in database and then in the index. To update the index, Filr contacts all the index nodes individually. In a cluster environment, index nodes do not know about each other, so it is the Filr servers responsibility to keep all the nodes in sync by writing one by one to all the nodes. If any of the indexer nodes is unreachable, the writes are logged to the SS_IndexingJournals table in the database. When the index node again becomes available, these journals are applied to bring the node in sync with the other nodes. These journals are called as "Deferred logs" in Filr.

When a net folder sync is triggered on the Filr server, metadata information of the files and folder of the net folder is read from the file server and updated in both database and indexer. Access Control List (ACL) information of folders are read from the file server and updated in the database (only). ACL information of files is not stored anywhere; it is read from the file server in real time.

Filr uses the index server for listing files, folders and users (in the admin console and while sharing files). While listing files and folders, ACLs are also considered, so that users get to see only what he/she is supposed to see. Filr contacts the indexer node(s) for reads in round-robin fashion. Index nodes which have deferred logs (journals) will not be contacted for reads, because the deferred log suggests that the node is not up to date. 

The memcached daemon runs on indexer nodes, but it is not related to the indexer. Rather, it is a distributed second level cache for the database. Filr is configured with memcached IP addresses and all database reads will first hit the cache, and if the object is not in the cache, then it is read from database itself.

Keep in mind that it is Filr which sends read/write request to search nodes and it is Filr that is executing the sync operation.

When re-indexing all or large part of the tree, it is strongly recommended to change the access mode of the node(s) to be re-indexed to "write-only" before starting re-indexing so that all read requests can go to the other unselected node(s) while the re-indexing is in progress. At least one node must allow read-write at any given time. Once re-indexing is completed, change the access mode of the node(s) back to read-write so that search requests can use them again.

Re-indexing (rebuilding) a Net Folder takes time based on the number of files and folders in the Net Folder, and can be considerable. Re-indexing makes the node unavailable for servicing reads. For example, in the case where there are two index nodes, if index node one has deferred logs (journals), it cannot service reads until the logs gets applied. If a re-index is triggered on index node two, then there are no index nodes to serve reads. As a rule, it is not advisable to re-index unless index files are corrupted or when the index node becomes unreachable for for a long time.