SolrCloud: Configuring Indexes

Note: This document describes the configuration for the SolrCloud and Sophora's Indexing Service.

Index Configurations

You can create as many indexes in Solr as you need. To do that, create a new index configuration by selecting Administration view > Index Configurations > New: Index Configuration from the context menu. This will let you create a new system document for a new index.

Note that an index only becomes active in Solr, if the index configuration document is published.

To delete an index from the Solr instance, simply set the index configuration document offline.

One index configuration leads to two Solr collections. One containing all documents in their working version, and one with all included documents in their last live version. The later one uses the suffix -live.

The following lists the properties of the index configuration document:

Property	Description
Name	The index name. This will also be used by the Solr server to store the index in the file system.
Offline-Index	Defines whether it is an offline collection.
Deleted documents	Deleted documents will be stored in the index.
Nodetypes	Only documents having the specified node types will be stored in the index.
Structurenodes	Only documents located in the specified structure nodes will be stored in the index.
Channels	Only documents valid for the selected channels will be stored in the index. Will store all documents if no channel is selected.
Filter Script	Only documents matching this Groovy filter script will be stored in the index. Will store all documents if no script is specified. The script must return true/false indicating whether the current document should be stored in the index, for example: `// this will filter out all documents not having the "myProp" property return document.hasProperty("sophora-content:myProp");` For a full list of available variables, refer to "Variables available to scripts" .
Mapping Document	One or more custom index mappings to use, see below.

Note: The properties Reindex Search Query and Remove after days (Remove after days reference) are not supported any more.

Index Mappings

When storing documents in the Solr index, the document's properties must be converted to be stored in index fields. This is done by a mapping. Sophora implements a default mapping for all property types and child nodes. Members of the administration team can configure new fields to be added to index documents.

This mapping is configurable by using mapping documents. The mapping documents are linked to an index by using the property "Mapping Document" of the index configuration.

To add a new custom mapping, create a new index mapping by selecting Administration view > Index Mappings > New: Index Mapping from the context menu. This will let you create a new system document for a new index mapping.

The following table lists the properties of the index mapping document:

Property	Description
Name	The index mapping name.
Base Mapping	Defines the Base Mapping for this Index. Possible Values: Standard, Only metadata See the section below for more details.
Indexfields	One or more custom index fields that are active when this mapping is used in an index configuration, see below.

The Solr index may not only contain Solr index documents for Sophora nodes, but also for child nodes if those child nodes are configured as rows of dynamic tables. To distinguish between the different index document types, each Solr index document has a solr_document_type_s field which can be set to the following value:

set to node, in which case it is a normal Sophora node, or
set to childNode in which case it is the child node of a Sophora node

Mapping for Sophora documents

There are two types of mappings: Standard and Only metadata.

Only metadata

The mapping "Only metadata" is a subset of the Standard mapping. It is the most appropriate one for offline collections. It consists of the following mappings:

Data Source	Solr Index Field
Document's UUID	Field "id"
Document's Sophora ID	Field "sophora_id_s"
Index document type	Field "solr_document_type_s"
Document's node type	Field "primaryType_s"
Document's structure node's path	Field "structureNode_path_s"
Document's structure node's alias	Field "structureNode_alias_s"
Document's structure node's default document's UUID	Field "structureNode_defaultDocumentUuid_s"
Document's structure node's hierarchy UUIDs	Field "structureNode_hierarchyUuids_ss"
Document's enabled state	Field "sophora_isEnabled_b"
Time/date of indexing the document	Field "indexedDate_dt"
Documents loaded from the server via getDocument*() calls	Field "included_uuid_ss" containing all UUIDs of documents that were loaded from the server via getDocument*() calls.

Standard mapping

The Standard mapping enhances the "Only metadata" mapping. It is also used if the field "Base Mapping" of a mapping document is left empty.

Data Source	Solr Index Field
String property	Field "NAME_s" for regular text fields, "NAME_t" for text fields having more than one row, or "NAME_txt" multi-valued string properties.
Long property	Field "NAME_l" for regular long properties, or "NAME_ls" for multi-valued long properties.
Double property	Field "NAME_d" for regular double properties, or "NAME_ds" for multi-valued double properties.
Date property	Field "NAME_dt" for regular date properties, or "NAME_dts" for multi-valued date properties
All other property types	Field "NAME_s" for regular properties, or "NAME_ss" for multi-valued properties. The property is converted to its string representation first.
Copytext child node	Field "copytext_t" containing complete plain text.
All components	Field "childnode_reference_uuid_ss" containing all UUIDs of referenced documents.
Properties of all components	Field "childnode_content_t" containing all the properties of all referenced documents. This does not include any properties in the "sophora"-Namespace, nor does it include any copytext. Relevant properties are converted to their string representations first.
Dynamic Tables	Separate Solr documents for each row of a dynamic table
Yellow Data	Separate Solr documents for each Yellow Data entry of the document

Default mapping for dynamic table child nodes

The following lists the default mappings between data sources and their respective Solr index fields for child nodes. The solr_document_type_s field will be set to childNode. Dynamic tables and child nodes are not indexed for indexes, which uses the "Only metadata" mapping.

Data Source	Solr Index Field
Unique index document id	Field "id"
Index document type	Field "solr_document_type_s" (the value is set to "childNode")
UUID of the parent document	Field "parentNode_uuid_s"
Primary type of the parent document	Field "parentNode_primaryType_s"
Structure hierachy uuids of the parent document	Field "parentNode_structureNode_hierarchyUuids_ss"
Primary type of the child node	Field "childNode_primaryType_s"
Name of the child node in the parent node's configuration	Field "childNode_name_s"
Index of this child node in the list of child nodes with this child node name.	Field "childNode_index_i"

Furthermore, all properties of the child node are mapped in the same way as in the default mapping for sophora documents.

Default-index mapping

A built-in index named "default" contains data from all documents in the repository. It is, e.g., used for DeskClient searches. A second build-in index is named "default-live". It contains all documents in their latest live version. These build-in indices cannot be configured using regular index configurations.

However, admins can add custom index fields to these indices using the "Default-Core-Mapping" index mapping document, which can be found at Administration view > Solr.

For instance, custom index fields within the default index can be used for custom search result orders (see Search Result Order Options in the Work Environment Configuration).

Index Fields

To create new fields (or override existing ones) in a Solr index document, there needs to be a specification of how data is converted to fill the new field. To create a new index field, select Administration view > Index Fields > New: Index Field from the context menu. This will let you create a new system document for a new index field.

The following table lists the properties of the index field document:

Property	Description
Label	The field's label.
Fieldname	The field's name. This name is used to store data into the Solr index.
Script	A Groovy script that returns data to be stored into the index field. The script must return an Object that represents the data to store into the field: return document.getString("sophora-content:title").toLowerCase();

Valid return types

null (the field will not be indexed)
Classes corresponding to primitive types, e.g. Integer, Double, Boolean
String
java.util.Date
java.util.Calendar
java.util.UUID
Collections of the above types (for multivalued fields)

Default imports

java.io.*
java.lang.*
java.math.BigDecimal
java.math.BigInteger
java.net.*
java.util.*
groovy.lang.*
groovy.util.*
com.subshell.sophora.api.*
com.subshell.sophora.api.content.*
com.subshell.sophora.api.content.retrievalresult.*
com.subshell.sophora.api.content.value.*
com.subshell.sophora.api.nodetype.*
com.subshell.sophora.api.structure.*

Variables available to scripts

Name	Type	Description	Remarks
document	INode	the document in question	-
nodeType	NodeType	the document's node type	-
contentManager	IContentManager	a content manager that can be used to get more information from the Sophora server	The use of the IContentManager is deprecated since version 5 and kept only for backward compatibility reasons. The ISophoraClient should be used instead. Starting with version 6, this variable is no longer supported.
sessionToken	SessionToken	a session token to be used with content manager calls	Since the sessionToken is only used in conjunction with the IContentManger, its remarks apply here as well. Starting with version 6, this variable is no longer supported.
sophoraClient	ISophoraClient	used to access the Sophora server	-
collectionName	String	The name if the collection for which the document is mapped	-
liveCollection	Boolean	Defines whether it is a live or working collection	-

Distinguishing between Working and Live Content

When a script is executed to calculate the value of a Solr field, it must consider whether it is executed for a working or a live collection.

In the first case, the script should only be interested in the working versions of documents and in the second one it should access the last live versions of documents. To avoid having to make this decision multiple times in the scripts, it is handled transparently for the scripts by the Sophora Indexing Service.

The instances of the IContentManager and the ISophoraClient always return documents in the correct version. So take for example a script which is currently executed for a live collection and which calls the method ISophoraClient#getDocumentByUuid. The instance of the ISophoraClient will transparently return the last live version of the document.

Methods available to scripts

Signature	Description
String stripRichText(String text)	Strips HTML/XML tags and some special characters from richtext fields.
IContent getNearestHierarchyDocument(IContent document)	Searches the parent structure nodes of the document for the first hierarchy document.
List derefChildNodes(IContent parent, String childNodeName)	Dereferences all child nodes with the given name. The method first reads a UUID from the property "sophora:reference" in each child node and then loads the referenced document. Child nodes without a "sophora:reference" property and external references are silently ignored. The returned documents don't contain binary data properties.
String getPropertyStringValueFromStructureNodeDocument(IContent document, String propertyName)	Searches all structure node documents in the structure node hierarchy of the document for the given string property. If no property is found, 'null' is returned.
List getSelectValueLabels(IContent document, String propertyName)	Returns the labels of the select value of the given property. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
List getSelectValueLabels(IContent document, NodeType nodeType, String propertyName)	Returns the labels of the select value of the given property for the given node type. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
String getDocumentUrl(UUID uuid)	Returns the delivery-side URL to a document.

The methods consider whether they are called for a live or working collection, and usees the appropriate content.

Synchronization of Modified Documents

Sophora documents are kept in sync with their counterparts in Solr collections. If any changes are made to a document, it will be re-indexed automatically to keep the index up to date. If a document is deleted, it will automatically be deleted from the corresponding collections (unless "Deleted documents" is checked in the index configuration).

When using custom index fields, the respective Groovy scripts may call getDocument*() methods on the ISophoraClient to retrieve data from other documents. Whenever one of these retrieved documents changes, the document using the custom index field will be re-indexed, too. In this way, the indexing service will automatically update information for a document that depends on other documents.

If there are documents that are often retrieved by Solr index scripts, saving, publishing, or setting offline those documents may cause a large number of reindexing operations.

Whenever a structure node is changed in a significant way, thus if it's name is changed or the value of a inherited property, then all affected documents located in that structure node (and its children) will be re-indexed automatically.

Publishing an index configuration triggers in-place re-indexing of the entire collection, ensuring continuous updates until completion. The enhanced collection supports unrestricted parallel search, reflecting real-time changes. Notably, publishing an index mapping or field document does not prompt a rebuild of associated collections.

When changing and publishing a channel, index mapping or index field, the index configurations which have to be republished are marked red in the administration view. Furthermore a sticky note is created. The colour of the sticky node is red by default. To change the colour you can add an entry with the key sophora.configuration.republishIndexConfigurationColor and for example the value 255,255,0 in the configuration document.

Querying a custom collection

To search for documents in a custom collection, you have to specify the collection to use in the search parameters. To achieve this use an instance of SolrSearchParameters. This class allows further parameters specific to searches in Solr. Use setCore(collectionname) to specify the collection for the search. Check the JavaDoc of that class for more information.

The term "core" was coined by former versions of Solr. In the context of sophora, "core" and "collection" are used in a equivalent manner for API compatibility reasons.

See the following code snippet:

// You can use any type of IQuery except for XPathQuery, which is not supported in Solr.
// With a SolrQuery you can query your own index fields
IQuery query = new SolrQuery("species_reversed_s: \"goD\"")
// use the special SolrSearchParameters instead of the common SearchParameters
SolrSearchParameters parameters = new SolrSearchParameters()
parameters.setPageSize(10)
// set the index/core to search in
parameters.setCore("Animalcore")
// execute the search
UuidSearchResult searchResult = sophoraClient.findDocumentUuids(query, parameters)
// do something with the result
List<UUID> uuids = searchResult.getUUIDs()
...

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.