Administration

Solr: Configuring Indexes (Embedded Solr)

You can create as many indexes in Solr as you need.

Index Configurations

You can create as many indexes in Solr as you need. To do that, create a new index configuration by selecting Administration view > Index Configurations > New: Index Configuration from the context menu. This will let you create a new system document for a new index.

Note that an index only becomes active in Solr if the index configuration document is published. To delete an index from the Solr instance, simply set the index configuration document offline.

The following lists the properties of the index configuration document:

PropertyDescription
NameThe index name. This will also be used by the Solr server to store the index in the filesystem.
Only published versionsOnly published version will be stored in the index.
Deleted documentsDeleted documents will be stored in the index.
Available on serverWhich servers the index will be available on. Note that the index will always be available on the Sophora Primary Server.
NodetypesOnly documents having the specified nodetypes will be stored in the index.
StructurenodesOnly documents located in the specified structure nodes will be stored in the index.
ChannelsOnly documents valid for the selected channels will be stored in the index. Will store all documents if no channel is selected.
Filter ScriptOnly documents matching this Groovy filter script will be stored in the index. Will store all documents if no script is specified.
The script must return true/false indicating whether the current document should be stored in the index, for example:

// this will filter out all documents not having the "myProp" property
return document.hasProperty("sophora-content:myProp");

The following predefined variables may be used the script:

document (INode) - the document in question
contentManager (IContentManager) - a content manager that can be used to get more information from the server
sessionToken (SessionToken) - a session token to be used with content manager calls
Reindex Search QueryOnly documents specified in this query will be reindexed. This will prevent the index from getting reindexed completely when the index configuration is published a second time.

The query is written in JCR format. For example, the following will only reindex a certain document type:
@jcr:primaryType = 'sophora-nt:myType'

Another use case would be only reindexing documents that have a certain value stored in a property. The following will only reindex documents having the "myProp" property with a value of "test":
@sophora-content:myProp='test'

Reindexing queries are commonly intended to be used just once by the next reindexing task. However, once set they will apply to all upcoming tasks if not removed. Since this might be unintentional, a warning will appear as a reminder that a reindexing query is still present and will affect upcoming reindexing tasks.

Note: This property does not work with the Sophora Indexing Service (there is a REST call available as replacement). It will be removed in Sophora 5.
Mapping DocumentOne or more custom index mappings to use, see below.
Remove after daysDocuments older than this number of days will automatically be removed from the index.
Remove after days referenceSpecifies the date properties that serve as reference for removing documents after a number of days. The order is important: The second property in the list is only evaluated when the first property is not set, and so on.

Index Mappings

When storing documents in the Solr index, their properties must be converted to store them into index fields. This is done by a "mapping." Sophora implements a default mapping for all property types and child nodes. Administrators may configure new fields to be added to index documents

The Solr index may not only contain Solr index documents for sophora nodes, but also for child nodes if those child nodes are configured as rows of dynamic tables. To distinguish between the different index document types, each Solr index document has a solr_document_type_s field which can be set to the following value:

  • empty or set to node, in which case it is a normal sophora node
  • or
  • set to childNode in which case it is the child node of a sophora node

Default Mapping for sophora documents

The following lists the default mappings between data sources and their respective Solr index fields for sophora documents. The solr_document_type_s field will be set to node or, in an upgrade scenario, not be present at all.

Data SourceSolr Index Field
Document's UUIDField "id"
Index document typeField "solr_document_type_s" (the field is either not present or the value is set to "node")
Document's node typeField "primaryType_s"
Document's structure node's pathField "structureNode_path_s"
Document's structure node's aliasField "structureNode_alias_s"
Document's structure node's default document's UUIDField "structureNode_defaultDocumentUuid_s"
Document's structure node's hierarchy UUIDsField "structureNode_hierarchyUuids_ss"
Time/date of indexing the documentField "indexedDate_dt"
String propertyField "NAME_s" for regular text fields, "NAME_t" for text fields having more than one row, or "NAME_txt" multi-valued string properties.
Long propertyField "NAME_l" for regular long properties, or "NAME_ls" for multi-valued long properties.
Double propertyField "NAME_d" for regular double properties, or "NAME_ds" for multi-valued double properties.
Date propertyField "NAME_dt" for regular date properties, or "NAME_dts" for multi-valued date properties
All other property typesField "NAME_s" for regular properties, or "NAME_ss" for multi-valued properties. The property is converted to its string representation first.
Copytext child nodeField "copytext_t" containing complete plain text.
All componentsField "childnode_reference_uuid_ss" containing all UUIDs of referenced documents.
Properties of all componentsField "childnode_content_t" containing all the properties of all referenced documents. This does not include any properties in the "sophora"-Namespace, nor does it include any copytext. Relevant properties are converted to their string representations first.
Documents loaded from the server via getDocument*() callsField "included_uuid_ss" containing all UUIDs of documents that were loaded from the server via getDocument*() calls.

Default mapping for dynamic table child nodes

The following lists the default mappings between data sources and their respective Solr index fields for child nodes. The solr_document_type_s field will be set to childNode.

Data SourceSolr Index Field
Unique index document idField "id"
Index document typeField "solr_document_type_s" (the value is set to "childNode")
UUID of the parent documentField "parentNode_uuid_s"
Primary type of the parent documentField "parentNode_primaryType_s"
Structure hierachy uuids of the parent documentField "parentNode_structureNode_hierarchyUuids_ss"
Primary type of the child nodeField "childNode_primaryType_s"
Name of the child node in the parent node's configurationField "childNode_name_s"
Index of this child node in the list of child nodes with this child node name.Field "childNode_index_i"

Furthermore, all properties of the child node are mapped in the same way as in the default mapping for sophora documents.

Custom mapping for sophora documents

In the event that Sophora's default mapping is not sufficient, administrators may opt to add customized mappings. These may either add new fields or override default fields.

To add a new custom mapping, create a new index mapping by selecting Administration view > Index Mappings > New: Index Mapping from the context menu. This will let you create a new system document for a new index mapping.

The following lists the properties of the index mapping document:

PropertyDescription
NameThe index mapping name.
IndexfieldsOne or more custom index fields that are active when this mapping is used in an index configuration, see below.
Channel affiliationsThe solr fields channel_names_ss and channel_uuids_ss contain the channels which are enabled for the document if the channel is enabled here. The difference to sophora_enabledChannels_ss is that it contains information which are inherited from structure nodes.

Default-index mapping

A built-in index named "default" contains data from all documents in the repository. It is, e.g., used for DeskClient searches. A second build-in index is named "default-live". It contains all documents in their latest live version. These build-in indices cannot be configured using regular index configurations. However, admins can add custom index fields to these indices using the "Default-Core-Mapping" index mapping document, which can be found at Administration view > Solr. For instance, custom index fields within the default index can be used for custom search result orders (see Search Result Order Options in the Work Environment Configuration).

Indexing of dynamic tables

If a sophora document includes one or more dynamic tables, each row of the dynamic table will be represented by a separate Solr index document.

Index Fields

To actually create new fields (or override existing ones) in a Solr index document, there needs to be a specification of how data is converted to fill the new field. To create a new index field, select Administration view > Index Fields > New: Index Field from the context menu. This will let you create a new system document for a new index field.

The following lists the properties of the index field document:

PropertyDescription
LabelThe field's label.
FieldnameThe field's name. This name is used to store data into the Solr index.
ScriptA Groovy script that returns data to be stored into the index field.
The script must return an Object that represents the data to store into the field:

return document.getString("sophora-content:title").toLowerCase();

Valid return types

  • null (the field will not be indexed)
  • Classes corresponding to primitive types, e.g. Integer, Double, Boolean
  • String
  • java.util.Date
  • java.util.Calendar
  • java.util.UUID
  • Collections of the above types (for multivalued fields)

Default imports

  • java.io.*
  • java.lang.*
  • java.math.BigDecimal
  • java.math.BigInteger
  • java.net.*
  • java.util.*
  • groovy.lang.*
  • groovy.util.*
  • com.subshell.sophora.api.*
  • com.subshell.sophora.api.content.*
  • com.subshell.sophora.api.content.value.*
  • com.subshell.sophora.api.nodetype.*
  • com.subshell.sophora.api.structure.*

Variables available to scripts

NameTypeDescription
documentINodethe document in question
nodeTypeNodeTypethe document's node type
contentManagerIContentManagera content manager that can be used to get more information from the server
sessionTokenSessionTokena session token to be used with content manager calls

Methods available to scripts

SignatureDescription
String stripRichText(String text)Strips HTML/XML tags and some special characters from richtext fields.
IContent getNearestHierarchyDocument(IContent document)Searches the parent structure nodes of the document for the first hierarchy document.
List derefChildNodes(IContent parent, String childNodeName)Dereferences all child nodes with the given name. The method first reads a UUID from the property "sophora:reference" in each child node and then loads the referenced document. Child nodes without a "sophora:reference" property and external references are silently ignored. The returned documents don't contain binary data properties.
String getPropertyStringValueFromStructureNodeDocument(IContent document, String propertyName)Searches all structure node documents in the structure node hierarchy of the document for the given string property. If no property is found, 'null' is returned.
List getSelectValueLabels(IContent document, String propertyName)Returns the labels of the select value of the given property. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
List getSelectValueLabels(IContent document, NodeType nodeType, String propertyName)Returns the labels of the select value of the given property for the given node type. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned.
String getDocumentUrl(UUID uuid)Returns the delivery-side URL to a document.

Synchronization of Modified Documents

Whenever a document is modified that should be stored in the index (according to the index configuration), this document is held synchronized with the index. If any changes are made to the document, it will be reindexed automatically to hold the index up to date. If a document is deleted, it will automatically be deleted from the index if "Deleted documents" is not checked in the index configuration.

When using custom index fields, the respective Groovy scripts may call getDocument*() methods on the content manager. Whenever one of the documents returned by these methods changes, the document using the custom index field will be reindexed.

Whenever a structure node is changed, all documents located in that structure node (and its children) will be reindexed automatically.

Whenever an index configuration is published, the complete index will be rebuilt. In this case, rebuilding will take place in a temporary index so that the current index can still be used for searches. Once rebuilding is done, the temporary index becomes the new index, and the old index is deleted. Note, that publishing an index mapping document or an index field document will not trigger a rebuild of the associated indexes.

When changing and publishing a channel, index mapping or index field, the index configurations which have to be republished are marked red in the administration view. Furthermore a sticky note is created. The color of the sticky node is red by default. To change the color you can add an entry with the key sophora.configuration.republishIndexConfigurationColor and for example the value 255,255,0 in the configuration document.

Query a custom index

To search for documents in your own index (Solr core), you have to put it in the search parameters. To achieve this, the class SolrSearchParameters is used. This class allows to configure the Solr query. Use setCore(corename) to specify the index for the search. Check the JavaDoc of that class for more information.

See the following code snippet:

// You can create any query, like NodeTypeQuery or PropertyQuery (except for XPathQuery which cannot query Solr)
// With a SolrQuery you can query your own index fields
IQuery query = new SolrQuery("species_reversed_s: \"goD\"")
// use the special SolrSearchParameters instead of the common SearchParameters
SolrSearchParameters parameters = new SolrSearchParameters()
parameters.setPageSize(10)
// set the index/core to search in
parameters.setCore("Animalcore")
// execute the search
ISophoraClient client = ...
UuidSearchResult searchResult = client.findDocumentUuids(query, parameters)
// do something with the result
List<UUID> uuids = searchResult.getUUIDs()
...

Last modified on 10/16/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon