Configuring Indexes

Learn how to configure indexes in Sophora's DeskClient.

Index Configurations

You can create as many indexes in Solr as you need. To do that, create a new index configuration by selecting Administration view > Index Configurations > New: Index Configuration from the context menu. This will let you create a new system document for a new index.

Note that an index only becomes active in Solr, if the index configuration document is published.

To delete an index from the Solr instance, simply set the index configuration document offline.

One index configuration leads to two Solr collections. One containing all documents in their working version, and one with all included documents in their last live version. The later one uses the suffix -live.

The following lists the properties of the index configuration document:

import com.subshell.sophora.api.content.retrievalresult.*;import com.subshell.sophora.api.content.retrievalresult.*;
NameThe index name. This will also be used by the Solr server to store the index in the file system.
Offline-IndexDefines whether it is an offline collection.
Deleted documentsDeleted documents will be stored in the index.
NodetypesOnly documents having the specified node types will be stored in the index.
StructurenodesOnly documents located in the specified structure nodes will be stored in the index.
ChannelsOnly documents valid for the selected channels will be stored in the index. Will store all documents if no channel is selected.
Filter ScriptOnly documents matching this Groovy filter script will be stored in the index. Will store all documents if no script is specified.
The script must return true/false indicating whether the current document should be stored in the index, for example:

// this will filter out all documents not having the "myProp" property
return document.hasProperty("sophora-content:myProp");

For a full list of available variables, refer to "Variables available to scripts" .
Mapping DocumentOne or more custom index mappings to use, see below.

Index Mappings

When storing documents in the Solr index, the document's properties must be converted to be stored in index fields. This is done by a mapping. Sophora implements a default mapping for all property types and child nodes. Members of the administration team can configure new fields to be added to index documents.

This mapping is configurable by using mapping documents. The mapping documents are linked to an index by using the property "Mapping Document" of the index configuration.

To add a new custom mapping, create a new index mapping by selecting Administration view > Index Mappings > New: Index Mapping from the context menu. This will let you create a new system document for a new index mapping.

The following table lists the properties of the index mapping document:

NameThe index mapping name.
Base MappingDefines the Base Mapping for this Index.
Possible Values: Standard, Only metadata
See the section below for more details.
IndexfieldsOne or more custom index fields that are active when this mapping is used in an index configuration, see below.
Channel affiliationsThe Solr fields channel_names_ss and channel_uuids_ss contain the channels which are enabled for the document if the channel is enabled here. The difference to channel settings or sophora_enabledChannels_ss is that it contains information which are inherited from structure nodes.

Note: As of version 4.7.0 channel affiliations defined in any solr mapping document will be ignored. All channels are always taken into account, regardless of the solr index and mapping document.

The Solr index may not only contain Solr index documents for Sophora nodes, but also for child nodes if those child nodes are configured as rows of dynamic tables. To distinguish between the different index document types, each Solr index document has a solr_document_type_s field which can be set to the following value:

  • set to node, in which case it is a normal Sophora node, or
  • set to childNode in which case it is the child node of a Sophora node

Mapping for Sophora documents

There are two types of mappings: Standard and Only metadata.

Only metadata

The mapping "Only metadata" is a subset of the Standard mapping. It is the most appropriate one for offline collections. It consists of the following mappings:

Data SourceSolr Index Field
Document's UUIDField "id"
Document's Sophora IDField "sophora_id_s"
Index document typeField "solr_document_type_s"
Document's node typeField "primaryType_s"
Document's structure node's pathField "structureNode_path_s"
Document's structure node's aliasField "structureNode_alias_s"
Document's structure node's default document's UUIDField "structureNode_defaultDocumentUuid_s"
Document's structure node's hierarchy UUIDsField "structureNode_hierarchyUuids_ss"
Time/date of indexing the documentField "indexedDate_dt"
Documents loaded from the server via getDocument*() callsField "included_uuid_ss" containing all UUIDs of documents that were loaded from the server via getDocument*() calls.

Standard mapping

The Standard mapping enhances the "Only metadata" mapping. It is also used if the field "Base Mapping" of a mapping document is left empty.

Data SourceSolr Index Field
String propertyField "NAME_s" for regular text fields, "NAME_t" for text fields having more than one row, or "NAME_txt" multi-valued string properties.
Long propertyField "NAME_l" for regular long properties, or "NAME_ls" for multi-valued long properties.
Double propertyField "NAME_d" for regular double properties, or "NAME_ds" for multi-valued double properties.
Date propertyField "NAME_dt" for regular date properties, or "NAME_dts" for multi-valued date properties
All other property typesField "NAME_s" for regular properties, or "NAME_ss" for multi-valued properties. The property is converted to its string representation first.
Copytext child nodeField "copytext_t" containing complete plain text.
All componentsField "childnode_reference_uuid_ss" containing all UUIDs of referenced documents.
Properties of all componentsField "childnode_content_t" containing all the properties of all referenced documents. This does not include any properties in the "sophora"-Namespace, nor does it include any copytext. Relevant properties are converted to their string representations first.
Dynamic TablesSeparate Solr documents for each row of a dynamic table
Yellow DataSeparate Solr documents for each Yellow Data entry of the document

Default mapping for dynamic table child nodes

The following lists the default mappings between data sources and their respective Solr index fields for child nodes. The solr_document_type_s field will be set to childNode. Dynamic tables and child nodes are not indexed for indexes, which uses the "Only metadata" mapping.

Data SourceSolr Index Field
Unique index document idField "id"
Index document typeField "solr_document_type_s" (the value is set to "childNode")
UUID of the parent documentField "parentNode_uuid_s"
Primary type of the parent documentField "parentNode_primaryType_s"
Structure hierachy uuids of the parent documentField "parentNode_structureNode_hierarchyUuids_ss"
Primary type of the child nodeField "childNode_primaryType_s"
Name of the child node in the parent node's configurationField "childNode_name_s"
Index of this child node in the list of child nodes with this child node name.Field "childNode_index_i"

Furthermore, all properties of the child node are mapped in the same way as in the default mapping for sophora documents.

Default-index mapping

A built-in index named "default" contains data from all documents in the repository. It is, e.g., used for DeskClient searches. A second build-in index is named "default-live". It contains all documents in their latest live version. These build-in indices cannot be configured using regular index configurations.

However, admins can add custom index fields to these indices using the "Default-Core-Mapping" index mapping document, which can be found at Administration view > Solr.

For instance, custom index fields within the default index can be used for custom search result orders (see Search Result Order Options in the Work Environment Configuration).

Index Fields

To create new fields (or override existing ones) in a Solr index document, there needs to be a specification of how data is converted to fill the new field. To create a new index field, select Administration view > Index Fields > New: Index Field from the context menu. This will let you create a new system document for a new index field.

The following table lists the properties of the index field document:

LabelThe field's label.
FieldnameThe field's name. This name is used to store data into the Solr index.
ScriptA Groovy script that returns data to be stored into the index field.
The script must return an Object that represents the data to store into the field:

return document.getString("sophora-content:title").toLowerCase();

Valid return types

  • null (the field will not be indexed)
  • Classes corresponding to primitive types, e.g. Integer, Double, Boolean
  • String
  • java.util.Date
  • java.util.Calendar
  • java.util.UUID
  • Collections of the above types (for multivalued fields)

Default imports

  • java.lang.*
  • java.math.BigDecimal
  • java.math.BigInteger
  • java.util.*
  • groovy.lang.*
  • groovy.util.*
  • com.subshell.sophora.api.*
  • com.subshell.sophora.api.content.*
  • com.subshell.sophora.api.content.retrievalresult.*
  • com.subshell.sophora.api.content.value.*
  • com.subshell.sophora.api.nodetype.*
  • com.subshell.sophora.api.structure.*

Variables available to scripts

documentINodethe document in question-
nodeTypeNodeTypethe document's node type-
contentManagerIContentManagera content manager that can be used to get more information from the Sophora serverThe use of the IContentManager is deprecated since version 5 and kept only for backward compatibility reasons. The ISophoraClient should be used instead.
sessionTokenSessionTokena session token to be used with content manager callsSince the sessionToken is only used in conjunction with the IContentManger, its remarks apply here as well.
sophoraClientISophoraClientused to access the Sophora server-
collectionNameStringThe name if the collection for which the document is mapped-
liveCollectionBooleanDefines whether it is a live or working collection-

Distinguishing between Working and Live Content

When a script is executed to calculate the value of a Solr field, it must consider whether it is executed for a working or a live collection.

In the first case, the script should only be interested in the working versions of documents and in the second one it should access the last live versions of documents. To avoid having to make this decision multiple times in the scripts, it is handled transparently for the scripts by the Sophora Indexing Service.

The instances of the IContentManager and the ISophoraClient always return documents in the correct version. So take for example a script which is currently executed for a live collection and which calls the method ISophoraClient#getDocumentByUuid. The instance of the ISophoraClient will transparently return the last live version of the document.

Methods available to scripts

String stripRichText(String text)Strips HTML/XML tags and some special characters from richtext fields.
IContent getNearestHierarchyDocument(IContent document)Searches the parent structure nodes of the document for the first hierarchy document.
List derefChildNodes(IContent parent, String childNodeName)Dereferences all child nodes with the given name. The method first reads a UUID from the property "sophora:reference" in each child node and then loads the referenced document. Child nodes without a "sophora:reference" property and external references are silently ignored. The returned documents don't contain binary data properties.
String getPropertyStringValueFromStructureNodeDocument(IContent document, String propertyName)Searches all structure node documents in the structure node hierarchy of the document for the given string property. If no property is found, 'null' is returned.
List getSelectValueLabels(IContent document, String propertyName)Returns the labels of the select value of the given property. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
List getSelectValueLabels(IContent document, NodeType nodeType, String propertyName)Returns the labels of the select value of the given property for the given node type. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
String getDocumentUrl(UUID uuid)Returns the delivery-side URL to a document.

Synchronization of Modified Documents

Sophora documents are kept in sync with their counterparts in Solr collections. If any changes are made to a document, it will be re-indexed automatically to keep the index up to date. If a document is deleted, it will automatically be deleted from the corresponding collections (unless "Deleted documents" is checked in the index configuration).

When using custom index fields, the respective Groovy scripts may call getDocument*() methods on the ISophoraClient to retrieve data from other documents. Whenever one of these retrieved documents changes, the document using the custom index field will be re-indexed, too. In this way, the indexing service will automatically update information for a document that depends on other documents.

Whenever a structure node is changed in a significant way, thus if it's name is changed or the value of a inherited property, then all affected documents located in that structure node (and its children) will be re-indexed automatically.

Publishing an index configuration triggers in-place re-indexing of the entire collection, ensuring continuous updates until completion. The enhanced collection supports unrestricted parallel search, reflecting real-time changes. Notably, publishing an index mapping or field document does not prompt a rebuild of associated collections.

When changing and publishing a channel, index mapping or index field, the index configurations which have to be republished are marked red in the administration view. Furthermore a sticky note is created. The colour of the sticky node is red by default. To change the colour you can add an entry with the key sophora.configuration.republishIndexConfigurationColor and for example the value 255,255,0 in the configuration document.

Querying a custom collection

To search for documents in a custom collection, you have to specify the collection to use in the search parameters. To achieve this use an instance of SolrSearchParameters. This class allows further parameters specific to searches in Solr. Use setCore(collectionname) to specify the collection for the search. Check the JavaDoc of that class for more information.

See the following code snippet:

// You can use any type of IQuery except for XPathQuery, which is not supported in Solr.
// With a SolrQuery you can query your own index fields
IQuery query = new SolrQuery("species_reversed_s: \"goD\"")
// use the special SolrSearchParameters instead of the common SearchParameters
SolrSearchParameters parameters = new SolrSearchParameters()
// set the index/core to search in
// execute the search
UuidSearchResult searchResult = sophoraClient.findDocumentUuids(query, parameters)
// do something with the result
List<UUID> uuids = searchResult.getUUIDs()

Last modified on 10/16/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.