Retain 4.0 Indexer Glossary of Terms

  • 7019104
  • 18-Nov-2015
  • 07-Aug-2017

Environment


Retain 4
Solr Indexer

Situation


On the Index page in the Retain administration tool, it mentions terms like "shard" and I see "zookeeper" referenced in the logs.  What do these and other terms mean?

Resolution


Here is a glossary of the new Solr indexer terminology:

Index Server
A single running high performance indexer (HPI) process as installed with the HPI Installer.  Retain ships with a the standard indexer.  For additional cost, customers can purchase the HPI for high availability and performance.

Index Core
A running instance of an HPI index along with all the configuration required to use it.  A single Index Server process can contain 0 or more Index Cores, which are run largely in isolation but can communicate with each other if necessary.

Index Manager (a.k.a., Zookeeper)
A program that helps other programs keep a functional cluster running.  It handles leader elections.  Although HPI can be run with an embedded Index Manager, it is recommended that it be standalone, installed separately from the index servers.  It is also recommended that it be a redundant ensemble, requiring at least three hosts.

Shard / Partition
A logical piece (or slice) of a collection.  Each shard is made up of one or more replicas.  An election is held to determine which replica is the leader.  The HPI concept of a shard is a logical division.

Replica
One copy of a shard.  An index created with numShards=1 and replicationFactor set to 2 will have exactly 2 replicas; so, there will be 2 cores, each on a different machine (or HPI instance).  One will be named [collectionname]_shard1_replica1 and the other will be named [collectionname]_shard1_replica2.  One of them will be elected to be the leader.  Note:  Retain takes care of the collection naming and usage for the user.  The only people that will need to know about collections are those that are sharing the same solr servers for other applications; in which case they will already know about collections and what they are.

Collection
A complete logical index in HPI.  It is made up of one or more shards/partitions.  If the number of shards/paritions is more than one, it is a distributed index.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2661.