Index Configurations
You can create as many indexes in Solr as you need. To do that, create a new index configuration by selecting Administration view > Index Configurations > New: Index Configuration from the context menu. This will let you create a new system document for a new index.
Note that an index only becomes active in Solr, if the index configuration document is published.
To delete an index from the Solr instance, simply set the index configuration document offline.
One index configuration leads to two Solr collections. One containing all documents in their working version, and one with all included documents in their last live version. The later one uses the suffix -live.
The following lists the properties of the index configuration document:
Property | Description |
---|---|
Name | The index name. This will also be used by the Solr server to store the index in the file system. |
Offline-Index | Defines whether it is an offline collection. |
Deleted documents | Deleted documents will be stored in the index. |
Nodetypes | Only documents having the specified node types will be stored in the index. |
Structurenodes | Only documents located in the specified structure nodes will be stored in the index. |
Channels | Only documents valid for the selected channels will be stored in the index. Will store all documents if no channel is selected. |
Filter Script | Only documents matching this Groovy filter script will be stored in the index. Will store all documents if no script is specified. The script must return true/false indicating whether the current document should be stored in the index, for example: // this will filter out all documents not having the "myProp" property For a full list of available variables, refer to "Variables available to scripts" . |
Mapping Document | One or more custom index mappings to use, see below. |
Index Mappings
When storing documents in the Solr index, the document's properties must be converted to be stored in index fields. This is done by a mapping. Sophora implements a default mapping for all property types and child nodes. Members of the administration team can configure new fields to be added to index documents.
This mapping is configurable by using mapping documents. The mapping documents are linked to an index by using the property "Mapping Document" of the index configuration.
To add a new custom mapping, create a new index mapping by selecting Administration view > Index Mappings > New: Index Mapping from the context menu. This will let you create a new system document for a new index mapping.
The following table lists the properties of the index mapping document:
Property | Description |
---|---|
Name | The index mapping name. |
Base Mapping | Defines the Base Mapping for this Index. Possible Values: Standard, Only metadata See the section below for more details. |
Indexfields | One or more custom index fields that are active when this mapping is used in an index configuration, see below. |
The Solr index may not only contain Solr index documents for Sophora nodes, but also for child nodes if those child nodes are configured as rows of dynamic tables. To distinguish between the different index document types, each Solr index document has a solr_document_type_s
field which can be set to the following value:
- set to
node
, in which case it is a normal Sophora node, or - set to
childNode
in which case it is the child node of a Sophora node
Only metadata
The mapping "Only metadata" is a subset of the Standard mapping. It is the most appropriate one for offline collections. It consists of the following mappings:
Data Source | Solr Index Field |
---|---|
Document's UUID | Field "id" |
Document's Sophora ID | Field "sophora_id_s" |
Index document type | Field "solr_document_type_s" |
Document's node type | Field "primaryType_s" |
Document's structure node's path | Field "structureNode_path_s" |
Document's structure node's alias | Field "structureNode_alias_s" |
Document's structure node's default document's UUID | Field "structureNode_defaultDocumentUuid_s" |
Document's structure node's hierarchy UUIDs | Field "structureNode_hierarchyUuids_ss" |
Document's enabled state | Field "sophora_isEnabled_b" |
Time/date of indexing the document | Field "indexedDate_dt" |
Documents loaded from the server via getDocument*() calls | Field "included_uuid_ss" containing all UUIDs of documents that were loaded from the server via getDocument*() calls. |
Standard mapping
The Standard mapping enhances the "Only metadata" mapping. It is also used if the field "Base Mapping" of a mapping document is left empty.
Data Source | Solr Index Field |
---|---|
String property | Field "NAME_s" for regular text fields, "NAME_t" for text fields having more than one row, or "NAME_txt" multi-valued string properties. |
Long property | Field "NAME_l" for regular long properties, or "NAME_ls" for multi-valued long properties. |
Double property | Field "NAME_d" for regular double properties, or "NAME_ds" for multi-valued double properties. |
Date property | Field "NAME_dt" for regular date properties, or "NAME_dts" for multi-valued date properties |
All other property types | Field "NAME_s" for regular properties, or "NAME_ss" for multi-valued properties. The property is converted to its string representation first. |
Copytext child node | Field "copytext_t" containing complete plain text. |
All components | Field "childnode_reference_uuid_ss" containing all UUIDs of referenced documents. |
Properties of all components | Field "childnode_content_t" containing all the properties of all referenced documents. This does not include any properties in the "sophora"-Namespace, nor does it include any copytext. Relevant properties are converted to their string representations first. |
Dynamic Tables | Separate Solr documents for each row of a dynamic table |
Yellow Data | Separate Solr documents for each Yellow Data entry of the document |
Default mapping for dynamic table child nodes
The following lists the default mappings between data sources and their respective Solr index fields for child nodes. The solr_document_type_s
field will be set to childNode
. Dynamic tables and child nodes are not indexed for indexes, which uses the "Only metadata" mapping.
Data Source | Solr Index Field |
---|---|
Unique index document id | Field "id" |
Index document type | Field "solr_document_type_s" (the value is set to "childNode") |
UUID of the parent document | Field "parentNode_uuid_s" |
Primary type of the parent document | Field "parentNode_primaryType_s" |
Structure hierachy uuids of the parent document | Field "parentNode_structureNode_hierarchyUuids_ss" |
Primary type of the child node | Field "childNode_primaryType_s" |
Name of the child node in the parent node's configuration | Field "childNode_name_s" |
Index of this child node in the list of child nodes with this child node name. | Field "childNode_index_i" |
Furthermore, all properties of the child node are mapped in the same way as in the default mapping for sophora documents.
Default-index mapping
A built-in index named "default" contains data from all documents in the repository. It is, e.g., used for DeskClient searches. A second build-in index is named "default-live". It contains all documents in their latest live version. These build-in indices cannot be configured using regular index configurations.
However, admins can add custom index fields to these indices using the "Default-Core-Mapping" index mapping document, which can be found at Administration view > Solr.
For instance, custom index fields within the default index can be used for custom search result orders (see Search Result Order Options in the Work Environment Configuration).
Index Fields
To create new fields (or override existing ones) in a Solr index document, there needs to be a specification of how data is converted to fill the new field. To create a new index field, select Administration view > Index Fields > New: Index Field from the context menu. This will let you create a new system document for a new index field.
The following table lists the properties of the index field document:
Property | Description |
---|---|
Label | The field's label. |
Fieldname | The field's name. This name is used to store data into the Solr index. |
Script | A Groovy script that returns data to be stored into the index field. The script must return an Object that represents the data to store into the field: return document.getString("sophora-content:title").toLowerCase(); |
Valid return types
null
(the field will not be indexed)- Classes corresponding to primitive types, e.g.
Integer
,Double
,Boolean
String
java.util.Date
java.util.Calendar
java.util.UUID
- Collections of the above types (for multivalued fields)
Default imports
java.io.*
java.lang.*
java.math.BigDecimal
java.math.BigInteger
java.net.*
java.util.*
groovy.lang.*
groovy.util.*
com.subshell.sophora.api.*
com.subshell.sophora.api.content.*
com.subshell.sophora.api.content.retrievalresult.*
com.subshell.sophora.api.content.value.*
com.subshell.sophora.api.nodetype.*
com.subshell.sophora.api.structure.*
Variables available to scripts
Name | Type | Description | Remarks |
---|---|---|---|
document | INode | the document in question | - |
nodeType | NodeType | the document's node type | - |
contentManager | IContentManager | a content manager that can be used to get more information from the Sophora server | The use of the IContentManager is deprecated since version 5 and kept only for backward compatibility reasons. The ISophoraClient should be used instead. |
sessionToken | SessionToken | a session token to be used with content manager calls | Since the sessionToken is only used in conjunction with the IContentManger, its remarks apply here as well. |
sophoraClient | ISophoraClient | used to access the Sophora server | - |
collectionName | String | The name if the collection for which the document is mapped | - |
liveCollection | Boolean | Defines whether it is a live or working collection | - |
Distinguishing between Working and Live Content
When a script is executed to calculate the value of a Solr field, it must consider whether it is executed for a working or a live collection.
In the first case, the script should only be interested in the working versions of documents and in the second one it should access the last live versions of documents. To avoid having to make this decision multiple times in the scripts, it is handled transparently for the scripts by the Sophora Indexing Service.
The instances of the IContentManager
and the ISophoraClient
always return documents in the correct version. So take for example a script which is currently executed for a live collection and which calls the method ISophoraClient#getDocumentByUuid
. The instance of the ISophoraClient
will transparently return the last live version of the document.
Methods available to scripts
Signature | Description |
---|---|
String stripRichText(String text) | Strips HTML/XML tags and some special characters from richtext fields. |
IContent getNearestHierarchyDocument(IContent document) | Searches the parent structure nodes of the document for the first hierarchy document. |
List derefChildNodes(IContent parent, String childNodeName) | Dereferences all child nodes with the given name. The method first reads a UUID from the property "sophora:reference" in each child node and then loads the referenced document. Child nodes without a "sophora:reference" property and external references are silently ignored. The returned documents don't contain binary data properties. |
String getPropertyStringValueFromStructureNodeDocument(IContent document, String propertyName) | Searches all structure node documents in the structure node hierarchy of the document for the given string property. If no property is found, 'null' is returned. |
List getSelectValueLabels(IContent document, String propertyName) | Returns the labels of the select value of the given property. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned. |
List getSelectValueLabels(IContent document, NodeType nodeType, String propertyName) | Returns the labels of the select value of the given property for the given node type. If the property is single-valued, the returned list will contain at most one element. If the property is multi-valued, the list will contain at most one label for each property value. If no label is found for a property value, none is returned. |
String getDocumentUrl(UUID uuid) | Returns the delivery-side URL to a document. |
Synchronization of Modified Documents
Sophora documents are kept in sync with their counterparts in Solr collections. If any changes are made to a document, it will be re-indexed automatically to keep the index up to date. If a document is deleted, it will automatically be deleted from the corresponding collections (unless "Deleted documents" is checked in the index configuration).
When using custom index fields, the respective Groovy scripts may call getDocument*()
methods on the ISophoraClient to retrieve data from other documents. Whenever one of these retrieved documents changes, the document using the custom index field will be re-indexed, too. In this way, the indexing service will automatically update information for a document that depends on other documents.
Whenever a structure node is changed in a significant way, thus if it's name is changed or the value of a inherited property, then all affected documents located in that structure node (and its children) will be re-indexed automatically.
Publishing an index configuration triggers in-place re-indexing of the entire collection, ensuring continuous updates until completion. The enhanced collection supports unrestricted parallel search, reflecting real-time changes. Notably, publishing an index mapping or field document does not prompt a rebuild of associated collections.
When changing and publishing a channel, index mapping or index field, the index configurations which have to be republished are marked red in the administration view. Furthermore a sticky note is created. The colour of the sticky node is red by default. To change the colour you can add an entry with the key sophora.configuration.republishIndexConfigurationColor and for example the value 255,255,0 in the configuration document.
Querying a custom collection
To search for documents in a custom collection, you have to specify the collection to use in the search parameters. To achieve this use an instance of SolrSearchParameters
. This class allows further parameters specific to searches in Solr. Use setCore(collectionname)
to specify the collection for the search. Check the JavaDoc of that class for more information.
See the following code snippet:
// You can use any type of IQuery except for XPathQuery, which is not supported in Solr.
// With a SolrQuery you can query your own index fields
IQuery query = new SolrQuery("species_reversed_s: \"goD\"")
// use the special SolrSearchParameters instead of the common SearchParameters
SolrSearchParameters parameters = new SolrSearchParameters()
parameters.setPageSize(10)
// set the index/core to search in
parameters.setCore("Animalcore")
// execute the search
UuidSearchResult searchResult = sophoraClient.findDocumentUuids(query, parameters)
// do something with the result
List<UUID> uuids = searchResult.getUUIDs()
...