Apache Solr General Terminology
In Apache Solr, the following are the general terms we will use while working with Apache solr. So, it is necessary to understand these terms briefly.
Instance: Instance in Solr is similar to a Tomcat instance or a jetty instance that refers to the application server, which runs inside Java Virtual Machine. Each Solr instances are provided with reference by the home directory, in which one or more cores can be configured to run inside each instance.
Core: When we run multiple indexes in our application, we can have multiple cores in each instance instead of multiple instances, each of them having one core.
Home: The term $SOLR_HOME refers to the home directory, which has all the information regarding the cores and their configuration, indexes, and dependencies.
Shard: The data is partitioned in a distributed environment between different Solr instances, where every chunk of data can be called a Shard. Shards contain a subset of the whole index.
SolrCloud Terminology
We can install solr in distributed mode (i.e., cloud environment) installed in a master-slave pattern. In a cloud environment, the indexes have been created on the master server, and it is replicated to one or more slave servers.
Given below are the key terms that are associated with the Solr cloud:
Node: Each instance of Solr is regarded as a node in the Solr cloud.
Cluster: All the nodes of the environment, when combined, makes a cluster.
Collection: Collection is a logical index that is obtained by a cluster.
Shard: It is a portion of a collection that has one or more replicas of the index.
Replica: A copy of the shard that runs in a node in the Solr core is known as a replica.
Leader: It is also a copy of the shard that distributes the request of the Solr Cloud to the remaining replicas.
Zookeeper: It is an Apache project that can be used by Solr cloud to centralize configuration and coordination for the management of the cluster and to elect a leader.
Configuration of Solr
Solr works out of the box without repairing any configuration changes. But at some point, we need to optimize Solr for our specific search-application requirements. The following are the main configuration files in Apache Solr.
Solr.xml: This file presents in the $SOLR_HOME directory that contains information related to the Solr cloud. The given file will be used to load the cores that help in identifying them.
Solrconfig.xml: This file contains the core-specific configuration and definitions related to request handling and response formatting, along with indexing, configuration, managing memory and making commits.
Schema.xml: It contains the whole schema along with the field and field types.
Core.properties: This file contains the configuration specific to the core. It is referred to as core discovery, as it contains the name of the core and path of the data directory. It can be used in any directory that is considered as the core directory.
For auto-discovered cores using core.properties, the following are the configuration properties.
Parameter | Description |
---|---|
name | It provides the names that are required by the cores. |
config | It can be used to specify the configuration file default to solrconfig.xml. |
dataDir | It specifies the path to a directory containing the index files and updates log that is default to data under the instance directory. |
blogger | It specifies the path to a directory containing the update log. |
schema | It sets the name of the schema document that defaults to schema.xml. |
shard | It sets the shard ID for the present core. |
collection | It is the name of the SolrCloud collection from which this core belongs to. |
loadOnStartup | If this flag is set to true, a new searcher is opened for the core when it is loaded during the Solr initialization process. |
transient | It indicates that this core can be unloaded automatically if Solr’s transientCacheSize threshold is reached. |
The ER-diagram shown below displays the collection1 core using core.properties and configures it using solrconfig.xml during server initialization.
Solrconfig.xml overview
Start the server using the command below in the command line.
After the server get started, go to the Solr admin console at http://localhost:8983/ solr. Click on the left collection1 option, then on the Files link. It will redirect you to the configuration files for the collection1 core as a directory structure. To display the active configuration settings for the collection1 core, click on the solrconfig.xml.
Common XML data-structure and type elements
Element | Description | Example |
---|---|---|
<arr> | Named, ordered array of objects | <arr name="last-components"> <str>spellcheck</str> </arr> <lst> Named, ordered list of name/ value pairs <lst name |
<lst> | Named, ordered list of name/ value pairs | <lst name="defaults"> <str name="omitHeader">true</str> <str name="wt">json</str> </lst> |
<bool> | Boolean value-true or false | <bool>true</bool> |
<str> | String value | <str>spellcheck</str> or <str name="wt">json</str> |
<int> | Integer value | <int>512</int> |
<long> | Long value | <long>1359936000000</long> |
<float> | Float value | <float>3.14</float> |
<double> | Double value | <double>3.14159265359</double> |