The Sophora Indexer is connected to the Sophora Primary Server's ContentManager and receives notifications when structure nodes or documents are changed. How this applies to the connected search engine is defined through an indexer plugin.
The Indexer holds a priority update queue to handle state change events and to manage resulting document operations with the following priorities:
Event | Priority | Explanation |
---|---|---|
Publishing a structure node | low | All subordinate documents of the published structure node are updated; i.e. re-indexed. |
Publishing a single document | high | Individual documents are favoured and re-indexed. |
Enabling a structure node | medium | All subordinate documents of the published structure node are updated; i.e. re-indexed. |
Disabling a structure node | – | No need for an update. All subordinate documents will just be removed from the index. The same applies when a structure node is set offline. |
Removing a document | – | No need for an update. The document will just be removed from the index. The same applies when a document is set offline. |
Removing a structure node | – | Has no impact on the document index since a structure node may only be deleted if no document is located at this node anymore. |
The indexer is separated into the projects com.subshell.sophora.Indexer
and com.subshell.sophora.indexer.api.
It is started by the class com.subshell.sophora.indexer.Indexer.
Directory Structure
As described in the Sophora Server Documentation it is recommended to use a certain directory structure for an installation of the Sophora Indexer.
The internal arrangement of files and directories within the tar.gz archive file, in which the indexer application is assembled, encourages the use of this structure.
----cms-directory
--------apps
------------...
------------com.subshell.sophora.indexer-2.3.0
------------...
------------sophora-indexer -> Symbolic link to com.subshell.sophora.indexer-2.3.0
------------...
--------indexer
------------config
----------------indexer.properties
------------groovy
------------logs
------------plugins
------------sophora-indexer.sh -> Symbolic link to ../apps/sophora-indexer/indexer.sh
apps
– The directory apps
contains the software components used in your Sophora environment. In this case the Sophora Indexer application is located in the subdirectory com.subshell.sophora.indexer-2.3
. A symbolic link points to this diretory in order to enable an easy change of different versions of the indexer - e.g. in case of an update.
indexer
- This folder is the workspace of the indexer installation. It contains the indexer's configuration file (in the subdirectory config
), the log files (in the subdirectory logs), the plugin to use (in the subdirectory plugins), groovy scripts for providing field values, and a symbolic link to the start and stop script. Except for the logs directory, which is created automatically after the first execution, this directory including its subdirectories and the symbolic link must be created manually.
plugins
- Place an indexer plugin jar of your choice in this directory. The plugin will then be made available for the indexer application at runtime. See the plugin mechanism section for more information.
Configuration file (sophora.properties)
The Indexer's behaviour is defined by a configuration file. This file is mandatory. At the start it is handed to the Indexer as VM argument. The syntax to specify the configuration file is as follows:
-Dsophora.properties=<path to properties file>
Properties from the configuration file overwrite those from the default configuration. To apply changes in this file the Indexer needs to be restarted.
Property | Mandatory | Description / sample value |
---|---|---|
sophora.contentmanager.serviceUrl | yes | URL of the content manager; protocol: RMI or HTTP. Example: rmi://localhost:1199/ContentManager |
sophora.contentmanager.username | yes | Username for the content manager |
sophora.contentmanager.password | yes | Password for the content manager |
sophora.contentmanager.proxyHost | no | URL of the proxy |
sophora.contentmanager.proxyPort | no | Port of the proxy (between 1024 - 65535) |
sophora.contentmanager.proxyUsername | no | Username for the proxy |
sophora.contentmanager.proxyPassword | no | Password fot the proxy |
sophora.contentmanager.connectRetries | no | Number of attempts to log into the sophora server in casethe login fails on first try. |
sophora.contentmanager.connectRetryInterval | no | The time in seconds to wait between connection attempts. |
sophora.searchEngine.connection | yes | Define a specific implementation here. There must be a correspondent spring bean which implements the ISearchEngineFactory interface (e.g. subsearch, solr, forum, facebook ) |
sophora.indexer.jolokia.port | no | The Port for the jolokia JMX adapter service. |
sophora.indexer.db.directory | no | Directory for the update-queue DB (default: ./db) |
sophora.indexer.searchMixinName | yes | Only documents with this mixin will be indexed. If this property is not set no documents will be indexed. If the search mixin changes it is necessary to reset an existing search index manually. |
sophora.indexer.unsearchableFieldName | no | |
sophora.indexer.removeBeforeUpdate | no | If set to true, a remove request for all index keys will be send to the search enginge before updating. Default value is true . |
sophora.indexer.removeAfterUpdate | no | Defines whether after an update, the document should be removed from all of the index keys to which it was not added. Default is true . Applies only if the property sophora.indexer.removeBeforeUpdate is set to false. |
sophora.indexer.urlService.url | no | This is the URL of a web service that generates URLs for sophora documents. See Generating URLs for Documents for details. If this is not set, URLs are generated using a built-in algorithm. |
sophora.indexer.urlService.onError.maxDelay | no | If an error occurred while requesting the URL from the specified web service, the indexer retries the attempt. The maximum number of seconds the Indexer is trying to get the URL of a document is configured with this property. The time is specified in seconds. The default setting is 600 (10 Minutes). The indexing of all subsequent documents is also delayed by this duration. |
sophora.replication.restartDate | no | Starting date of the synchronisation process after a restart. All documents that have been modified after this date are re-indexed. The date has to have this format: yyyy.mm.dd hh:mm |
sophora.replication.restartQuery | no | Query that is executed at a restart. Only documents that match this query are indexed. If this property is set, the sophora.replication.restartDate will be ignored. The query requires a XPath statement like the following: element(*, sophora-mix:document) [@sophora:id = 'test100'] |
sophora.startDatePropertyName | no | Name of the property which contains the "online from" information of a document (e.g. sophora:startdate ). |
sophora.searchEngine.fullupdate | no | Defines whether all available documents are indexed. Default is true |
sophora.indexer.alive.logfile | no | Destination of the logfile, which stores the last indexing date. It behaves like sophora.replication.restartDate , but reads and sets its date from the logfile automatically. If a specific restart date is set with sophora.replication.restartDate or sophora.searchEngine. is set to true , this logfile will be ignored. (default: logs/indexerLastAlive.log) |
sophora.indexer.jmx.registry.port | no | Port for JMX connections (between 1024 - 65535) |
sophora.indexer.rmi.registry.port | no | Port for the RMI registry (between 1024 - 65535) |
sophora.indexer.jmx.registry.username | no | Login for JMX connections |
sophora.indexer.jmx.registry.password | no | Password for JMX connections |
sophora.indexer.directory.xsl | no | Directory containing XSL files for transformation of string properties having values in XML format |
sophora.indexer.useExternalId | no | When set to true , the External-ID is used to identify a document instead of the UUID. This id is set as the value of the key documentKey in the document data map given to the indexer plugin. Most plugins use this key to identify each index record. E.g. in the case of the solr plugin this value is written into the search index field id. Should not be changed in a running system. Consequently it is necessary to clear the search index before changing the id.Default is false . |
sophora.indexer.queue.mechanism | no | Defines how the indexer processes elements from the queue. Possible values are:singleProcessing (default)One document at a time is processed. bulkProcessing The indexer takes a given number of documents at once from the queue. This can speed up processing with some plug-ins that profit from bulk processing (currently only the GSA plug-in). The number of documents has to be set with sophora.indexer.queue.bulkSize. The indexer then tries to get up to the defined number of documents. If the queue does not hold that amount of documents all available documents will be processed. If documents are processed faster than the queue is filled, this mechanism will behave like singleProcessing. delayedBulkProcessing Like bulkProcessing a number of documents is processed at a time. All the description of bulkProcessing applies. Additionally, to maximize the number of processed documents, the indexer waits up to a defined time to collect documents. The maximum time to wait before processing is defined with sophora.indexer.queue.maxDelay . The delay is counted since the first document is available in the queue. It is the maximum delay for indexing a document. The processing is delayed till the defined bulkSize is reached or the maxDelay , whatever comes first. If the queue is always filled with more elements than configured, than this mechanism behaves exactly like bulkProcessing . |
sophora.indexer.queue.bulkSize | no | Defines the maximum number of documents to process at once (default is 100). See the description of sophora.indexer.queue.mechanism for more information when to use this. |
sophora.indexer.queue.maxDelay | no | Defines the maximum delay in milliseconds for processing a document (default is 1000 ms). See the description of sophora.indexer.queue.mechanism for more information when to use this. |
sophora.indexer.numberOfRepeatAttempts | no | Defines the number of repeat attempts the indexer should perform if a search engine throws a RetryException (default is 6). |
sophora.indexer.repeatAttemptDelay | no | Defines the delay in milliseconds between repeat attempts after the occurrence of a RetryException within a search engine plugin (default is 10000). |
sophora.indexer.name | no | The Importer's name to be used for JMX. |
sophora.client.dataDir | no | Defines a directory which may be used by the Sophora Client Api for persisting information like the available nodes in a cluster. The directory must be specified over an absolute path. |
Exemplary configuration
# Connection to the ContentManager
sophora.contentmanager.serviceUrl=rmi://localhost:1199/ContentManager
sophora.contentmanager.username=admin
sophora.contentmanager.password=admin
# JMX settings
sophora.indexer.jmx.registry.port=50
sophora.indexer.rmi.registry.port=5031
sophora.indexer.jmx.registry.username=admin
sophora.indexer.jmx.registry.password=secret
# Query for the synchronisation after restarting (inactive)
#sophora.replication.restartQuery=element(*, sophora-mix:document)[@sophora:id = 'test100']
# Starting date for the synchronisation
sophora.replication.restartDate=2015.03.27 12:00
sophora.subsearch.fullUpdate=false
# Selected connection
sophora.searchEngine.connection=dummySubsearch
# Search mixins and fields
sophora.indexer.searchMixinName=sophora-content-mix:searchable
sophora.indexer.unsearchableFieldName=sophora-content:unsearchable
# Directory for XSL files to transform property values
sophora.indexer.directory.xsl=c:/temp
Mapping Document Properties to Index Fields of the Search Engine
The mapping of Sophora properties to index fields of the search engine is done in the siteAndMappingConfiguration.xml
file. This file has to be created and placed next to the sophora.properties
file in the same directory. If you apply changes to this configuration file, the Indexer needs to be restarted for changes to take effect.
The following example demonstrates all supported use cases and possible configurations. The XML scheme file can be downloaded here: indexer-configuration-1.0.0.xsd
Example of siteAndMappingConfiguration.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns="http://www.sophoracms.com/indexer-configuration/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sophoracms.com/indexer-configuration/1.0 http://www.sophoracms.com/indexer-configuration/1.0/indexer-configuration-1.0.0.xsd">
<!-- Assign index keys to sites and filters -->
<indexes>
<index indexKey="indexKey1" isDefault="true">
<sites>
<!-- name is optional, id is required. Use the uuid of the structure node or the externalid of the structure node document to identify a structure node. -->
<site name="sitename1" id="5c34195a-5574-4948-9b72-bc1df857fb8a" />
</sites>
<filter>
<allowedNodeTypes>
<allowedNodeType>sophora-content-nt:story</allowedNodeType>
</allowedNodeTypes>
<requiredChannel>c0970f7e-85e6-412b-9e52-27073ca84e58</requiredChannel>
<requiredProperty>sophora-content:topline</requiredProperty>
</filter>
</index>
<index indexKey="indexKey2">
<sites>
<site name="sitename2" id="32dc2576-93b4-407d-a971-c2c6a437d7fc" />
<site name="sitename3" id="b6d3cf76-b114-4e6c-95da-2033a413c08e" />
<site name="aStructureNodeOfSite3" id="c7ad6486-b35b-4e16-9cec-c3b26332580d" />
<site name="anotherStructureNodeOfSite3" id="05621d7b-585d-471d-8604-538c0b315880" />
</sites>
<filter>
<allowedNodeTypes>
<allowedNodeType>sophora-content-nt:audio</allowedNodeType>
<allowedNodeType>sophora-content-nt:story</allowedNodeType>
</allowedNodeTypes>
</filter>
</index>
</indexes>
<!-- Assign index key fields to Sophora document properties -->
<mappings>
<!-- The simpliest mapping is to assign a Sophora property to a search engine's index field -->
<mapping key="sophoraid">
<property>sophora:id</property>
</mapping>
<!-- You can set the format of date properties as they will appear in the search engine's index.
For instance, if you want a property called "dateToSearch" to appear in the format "yyyy.MM.dd.HH.mm.ss",
add the following lines to the configuration.
In most cases the format "yyyy.MM.dd.HH.mm.ss" is used as default value. Exceptions are the properties
"dateToSearch" and "publicationDate" which occure in "yyyy.MM.dd" by default. -->
<mapping key="dateToSearch" format="yyyy.MM.dd.HH.mm.ss">
<property>sophora:publicationDate</property>
</mapping>
<!-- Selectvalues: Without special configuration the selected key of a drop-down list is indexed.
If the mapped property is configured as a select value, it is possible to write the label of the
selected key into the index field. This is achieved by appending ".value" to the property name: -->
<mapping key="selectedValue">
<property>sophora:dropdownField.value</property>
</mapping>
<!-- raw property value: To map the property value without replacing all html/xml tags, you can append a ".rawValue" to the property name: -->
<mapping key="rawPropertyValue">
<property>sophora:property.rawValue</property>
</mapping>
<!-- For properties containing XML data, an XSL transformation can be performed.
The result of the transformation is then written into the index field.
The XSL file name is given by the attribute "xsl".
The file name is relative to the directory set in the property "sophora.indexer.directory.xsl" of the sophora.properties file. -->
<mapping key="longitude" xsl="longitude.xsl">
<property>sophora-content:map</property>
</mapping>
<!-- In order to write the content of a childnode into an index field use path expressions like these: -->
<mapping key="teaserImageOverwrittenAlttext">
<property>sophora-content:image/sophora-extension:alttext</property>
</mapping>
<mapping key="teaserImageUuid">
<property>sophora-content:image/sophora:reference</property>
</mapping>
<mapping key="teaserCopytextImageUuid">
<property>sophora-content:copytext/sophora-extension:paragraph/sophora-extension:paragraphimage[0]/sophora-extension:image[0]/sophora:reference</property>
</mapping>
<!-- If you want to insert multiple property values into a single index field, you can define these properties as a list.
As you can see, each part of the property list can also be a path expression which refers to childnode values.
Such a configuration will result in a single index field value, where the assigned values are separated by a white space character. -->
<mapping key="sequence">
<property>sophora-content:topline</property>
<property>sophora-content:headline</property>
<property>sophora-content:teasertext</property>
<property>sophora-content:image/sophora-extension:alttext</property>
</mapping>
<!-- It is also possible to configure alternative properties, if the intented property of a document is empty. You can also define multiple alternatives.
Example: If the 'sophora-content:date property' is not set, the value of 'sophora:publicationDate' is taken alternatively and so on: -->
<mapping key="date" format="yyyy.MM.dd.HH.mm">
<alternative>
<property>sophora-content:date</property>
<property>sophora:publicationDate</property>
<property>sophora:modificationDate</property>
</alternative>
</mapping>
<!-- Define multiple properties into a single index field, containing alternatives: -->
<mapping key="sequenceWithAlternatives">
<alternative>
<property>sophora-content:topline</property>
<property>sophora-content:headline</property>
</alternative>
<property>sophora-content:teasertext</property>
<alternative>
<property>sophora-content:date</property>
<property>sophora:publicationDate</property>
<property>sophora:modificationDate</property>
</alternative>
</mapping>
<!-- If a property is not available for a document, it may be retrieved from the structure hierarchy. To do so, you can use the following expression.
The example would be interpreted as follows: If the property "sophora:property" does not exist in the document that should be indexed,
the indexer tries to retrieve it from the structure node documents of the parent structure nodes. -->
<mapping key="field">
<alternative>
<property>sophora:property</property>
<operation>sophora.indexer.getPropertyValueFromStructureHierarchy</operation>
</alternative>
</mapping>
<!-- To write the UUIDs of the active channels of a document into a single index field,
you need to configure the mapping property accordingly. The index field's content will be a space separated list of UUIDs like:
e91c87d4-8e16-4e40-9d69-689548efe5ab f833f3f6-b894-4064-9d10-eadb244a52cf. The list includes the default structure hierarchy information;
e.g. if no channels are defined for this document, it might inherit some from the parent nodes. -->
<mapping key="channels">
<operation>sophora.indexer.activeChannels</operation>
</mapping>
<!-- Another possibility to use information from a document's structure is to determine the structure nodes' UUIDs by defining the subsequent mapping: -->
<mapping key="structureNodes">
<operation>sophora.indexer.generateStructurePathUuids</operation>
</mapping>
<!-- If you want to generate the URL of the document that is indexed, configure the following: -->
<mapping key="generatedUrl">
<operation>sophora.indexer.generateUrl</operation>
</mapping>
<!-- Or alternatively, only generate an url if the document doesn't provide one: -->
<mapping key="url">
<alternative>
<property>sophora-content:url</property>
<operation>sophora.indexer.generateUrl</operation>
</alternative>
</mapping>
<!-- It is possible to provide values using classes defined in Groovy or Java. The content
of the <operation>-element is the fully qualified classname of a class implementing
the interface IFieldValueSource. -->
<mapping key="groovyFoo">
<operation>Foo</operation>
</mapping>
<mapping key="groovyAlternative">
<alternative>
<operation>Foo</operation>
<operation>mycompany.Bar</operation>
<operation>sophora.indexer.generateUrl</operation>
</alternative>
</mapping>
</mappings>
</configuration>
Built-in field operations
Name | Description |
---|---|
sophora.indexer.activeChannels | Writes the UUIDs of the active channels of a document into the index field. The field's content will be a space separated list of UUIDs like: "e91c87d4-8e16-4e40-9d69-689548efe5ab f833f3f6-b894-4064-9d10-eadb244a52cf". The list includes the default structure hierarchy information; e.g. if no channels are defined for this document, it might inherit some from the parent nodes. |
sophora.indexer.getPropertyValueFromStructureHierarchy | Retrieves the content of a property from the structure hierarchy. This operation searches the structure nodes in the path of the document for the first structure node document that contains the property. This operation may only be used as the second entry in an <alternative >-Block. The first entry must be a <property >-Element, which defines the property to search for. |
sophora.indexer.generateStructurePathUuids | Writes the space-separated UUIDs of the structure path of the document. |
sophora.indexer.generateUrl | Generates an URL for the document using an internal algorithm, or, if the property sophora.indexer.urlService.url is set, by asking the url-service. |
Using Groovy scripts to implement custom operations
Operations providing field values can be implemented using Groovy scripts. The scripts must be located in the groovy
-directory next to the config
-directory. Each script must define a class that implements the interface com.subshell.sophora.indexer.source.IFieldValueSource
. A custom operation implented by a script is referenced in the mapping using the fully qualified name of the class defined by the script.
The following example defines an operation which sets the value of the field "structureNodeName" to the name of the structure node, where the document to be indexed is located.
siteAndMappingConfiguration.xml
<mapping key="structureNodeName"> <operation>GetStructureNodeName</operation></mapping>
groovy/GetStructureNodeName.groovy
import com.subshell.sophora.api.content.INode;
import com.subshell.sophora.api.structure.StructureInfo
import com.subshell.sophora.client.ISophoraClient;
import com.subshell.sophora.indexer.api.IFieldValueSource;
class GetStructureNodeName implements IFieldValueSource {
private ISophoraClient client;
@Override
public void setClient(ISophoraClient client) {
this.client = client;
}
@Override
public String getValue(INode document, String fieldName) {
def structure = client.getStructureInfo(document.getString("sophora:structureNode"))
return structure.getStructureNodeName()
}
}
Deprecated: Mapping Document Properties to Index Fields of the Search Engine in the sophora.properties File
If the siteAndMappingConfiguration.xml
file does not exist, the mapping of Sophora properties to index fields of the search engine is read from the sophora.properties file.
siteAndMappingConfiguration.xml
file, which is explained above.The following properties can be set in the sophora.properties
file to configure the mapping:
Property | Mandatory | Description + exemplary value |
---|---|---|
sophora.indexer.sites | yes | Comma separated list of sites. For each site the properties sophora.indexer.site. .id andsophora.indexer.site. .indexkey are required. |
sophora.indexer.site.<sitename>.id | yes | UUID of a site or structure node or the ExternalId of a structure node document. The placeholder "sitename" refers to the value of the property sophora.indexer.sites. |
sophora.indexer.site.<sitename>.indexkey | yes | For example tagesschauKey |
sophora.indexer.site.default.indexkey | no | If the site's UUID is empty, this one will be used instead |
mapping.<propertyname> | no | Maps a document's property to an index field of the search engine; e.g. sophora:id |
sophora.indexer.filter. <indexkey> .allowedTypes | no | Enumeration of document types (separated with commas) that should be included in the index. If this property is empty, all document types will be indexed. Example:sophora-nt:audio, sophora-nt:video |
sophora.indexer.filter. <indexkey> .requiredChannel | no | UUID of a delivery channel. If a document is excluded from this channel explicitly, it won't be indexed. If this property is empty, delivery channels are not considered while indexing. Can be set for each index separately. |
dateFormat. <propertyname> | no | Defines the date format for the mapping of the property with the given type, e.g yyyy.MM.dd.HH.mm.ss |
Example mapping configuration in sophora.properties
file (same configuration as in siteAndMappingConfiguration.xml
explained above):
Mapping in sophora.properties
# -----------------------------
# Configure index keys to sites
# -----------------------------
sophora.indexer.sites=sitename1,sitename2,sitename3,structureNodeOfSite3,anotherStructureNodeOfSite3
sophora.indexer.site.sitename1.id=5c34195a-5574-4948-9b72-bc1df857fb8a
sophora.indexer.site.sitename1.indexkey=indexKey1
sophora.indexer.site.sitename2.id=32dc2576-93b4-407d-a971-c2c6a437d7fc
sophora.indexer.site.sitename2.indexkey=indexKey2
sophora.indexer.site.sitename3.id=b6d3cf76-b114-4e6c-95da-2033a413c08e
sophora.indexer.site.sitename3.indexkey=indexKey2
sophora.indexer.site.aStructureNodeOfSite3.id=c7ad6486-b35b-4e16-9cec-c3b26332580d
sophora.indexer.site.aStructureNodeOfSite3.indexkey=indexKey2
sophora.indexer.site.anotherStructurNodeOfSite3.id=05621d7b-585d-471d-8604-538c0b315880
sophora.indexer.site.anotherStructurNodeOfSite3.indexkey=indexKey2
# If the site UUID is empty
sophora.indexer.site.default.indexkey=indexKey1
# -----------------
# Configure filters
# -----------------
sophora.indexer.filter.indexKey1.allowedTypes=sophora-content-nt:story
sophora.indexer.filter.indexKey1.requiredChannel=c0970f7e-85e6-412b-9e52-27073ca84e58
sophora.indexer.filter.indexKey1.requiredProperty=sophora-content:topline
sophora.indexer.filter.indexKey2.allowedTypes=sophora-content-nt:audio,sophora-content-nt:story
# ---------------------------------------------------------
# Configure index key fields to Sophora document properties
# ---------------------------------------------------------
# Each mapping of Sophora properties to index fields of the search engine is done in the following way:
# mapping.SEARCH_ENGINE_INDEXFIELD=SOPHRA_PROPERTY_EXPRESSION
# The simpliest mapping is to assign a Sophora property to a search engine's index field:
mapping.sophoraid=sophora:id
# You can set the format of date properties as they will appear in the search engine's index.
# For instance, if you want a property called "dateToSearch" to appear in the format "yyyy.MM.dd.HH.mm.ss",
# add the following in the configuration.
# In most cases the format "yyyy.MM.dd.HH.mm.ss" is used as default value. Exceptions are the properties
# "dateToSearch" and "publicationDate" which occure in "yyyy.MM.dd" by default.
mapping.dateToSearch=sophora:publicationDate
dateFormat.dateToSearch=yyyy.MM.dd.HH.mm.ss
# Selectvalues: Without special configuration the selected key of a drop-down list is indexed.
# If the mapped property is configured as a select value, it is possible to write the label of the
# selected key into the index field. This is achieved by appending ".value" to the property name:
mapping.selectedValue=sophora:dropdownField.value
# raw property value: To map the property value without replacing all html/xml tags, you can append a ".rawValue" to the property name:
mapping.rawPropertyValue=sophora:property.rawValue
# For properties containing XML data, an XSL transformation can be performed.
# The result of the transformation is then written to the index field.
# The XSL file name is given in a property with the name "mapping.<field>.xsl".
# The file name is relative to the directory set in the property "sophora.indexer.directory.xsl".
sophora.indexer.directory.xsl=c:/temp
mapping.longitude=sophora-content:map
mapping.longitude.xsl=longitude.xsl
# In order to write the content of a childnode into an index field use a path expression like:
mapping.teaserImageOverwrittenAlttext=sophora-content:image/sophora-extension:alttext
mapping.teaserImageUuid=sophora-content:image/sophora:reference
mapping.teaserCopytextImageUuid=sophora-content:copytext/sophora-extension:paragraph/sophora-extension:paragraphimage[0]/sophora-extension:image[0]/sophora:reference
# If you want to insert multiple property values into a single index field, you can define these properties as list.
# As you can see each part of the property list can also be a path expression to refer to childnode values.
# Such a configuration will result in a single index field value where the assigned values are separated by a white space character.
mapping.sequence=sophora-content:topline,sophora-content:headline,sophora-content:teasertext,sophora-content:image/sophora-extension:alttext
# It is also possible to configure alternative properties, if the intented property of a document is empty.
# This is achieved using the delimiter "|". You can also define multiple alternatives.
# Example: If the sophora-content:date property is not set, the value of sophora:publicationDate is taken alternatively and so on:
mapping.date=sophora-content:date|sophora:publicationDate|sophora:modificationDate
dateFormat.date=yyyy.MM.dd.HH.mm
# NOTE: The combination of "," and "|" within one mapping expression is allowed. A combination of "/" and "|" on the contrary is prohibited.
# Define multiple properties into a single index field, containing alternatives:
mapping.sequenceWithAlternatives=sophora-content:topline|sophora-content:headline,sophora-content:teasertext,sophora-content:date|sophora:publicationDate|sophora:modificationDate
# If a property is not available for a document, it may be retrieved from the structure hierarchy. To do so add the following expression.
# The example would be interpreted as follows: If the property "sophora:property" does not exist in the document that should be indexed,
# the indexer tries to retrieve it from the structure node documents of the superior structure nodes.
mapping.field=sophora:property|sophora.indexer.getPropertyValueFromStructureHierarchy
# To write the UUIDs of the active channels of a document (including default structure hierarchy information;
# e.g. if no channels are defined for this document, it might inherit some from the superior nodes) into a single index field,
# you need to configure the mapping property accordingly. The index field's content will be a space separated list of UUIDs like:
# e91c87d4-8e16-4e40-9d69-689548efe5ab f833f3f6-b894-4064-9d10-eadb244a52cf
mapping.channels=sophora.indexer.activeChannels
# Another possibility to use information from the structure is to determine the structure nodes' UUIDs by defining the subsequent mapping:
mapping.structureNodes=sophora.indexer.generateStructurePathUuids
# If you want to generate the URL of the document that is indexed, configure the following:
mapping.generatedUrl=sophora.indexer.generateUrl
# Or alternatively, only generate an url if the document doesn't provide one:
mapping.url=sophora-content:url|sophora.indexer.generateUrl
Generating URLs for Documents
If you want to generate the URL of the document that is indexed, using the sophora.indexer.generateUrl
operation, the URL is generated from these parameters by default:
- URL configured for the site (e.g.
http://www.sophoracms.com
), - name of the structure node from the structure path (e.g. "home"),
- the Sophora ID (e.g. "sophoraid100") and
- the extension ".html"
Example: http://www.sophoracms.com/home/sophoraid100.html
Using a Web Service
Instead of using the built-in algorithm, URLs for indexed documents can also be queried from a web service. To use this feature, you need to set the sophora.indexer.urlService.url
property in the configuration to the URL of the web service. The web service will be given the UUID of the indexed document as the HTTP-GET parameter "uuid
" and must return the URL of the document as plain text.
The indexer sets the following HTTP-GET parameters:
- uuid: The UUID of the document for which the web service should return the URL.
- modificationDate: The modification date of the document as milliseconds since the epoch (UTC). This is used for checking that the indexer and the web service have the same version of the document.
The following example shows the interaction between indexer and url-service for one document:
sophora.properties of indexer
sophora.indexer.urlService.url=http://mydomain.de/system/servlet/urlService.servlet
sophora.properties of web application
sophora.delivery.site.demosite.domain=http://mydomain.de
UUID of the indexed document: a2acc8f7-e2c3-4180-ada1-4fc4794453c9
Request by the indexer: http://mydomain.de/system/servlet/urlService.servlet?
uuid=a2acc8f7-e2c3-4180-ada1-4fc4794453c9&modificationDate=1319720134811
Response by the webservice: http://mydomain.de/demosite/news/news104.html
It is necessary to configure the service within your web application accordingly. Therefore you have to make sure the servlet is set up in the web.xml as follows.
web.xml
[...]
<servlet>
<servlet-name>urlServlet</servlet-name>
<servlet-class>com.subshell.sophora.delivery.servlet.UrlForIdServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>urlServlet</servlet-name>
<url-pattern>/system/servlet/urlService.servlet</url-pattern>
</servlet-mapping>
[...]
Besides the UUID, the servlet may take additional parameters for URL creation:
- type: defines the template type to be included within the URL. Default: 'default'
- suffix: file suffix to use for the URL creation. Default: 'html'.
- modificationDate: The modification date of the document as milliseconds since the epoch (UTC). This is used for checking that the indexer and the servlet have the same version of the document. If the modification date differs, an error is returned.
- channel: The URL is created for the given channel. Default is the default channel. Do not use together with domainProperty.
- domainProperty: The domain may be optionally determined via a property, which is passed through this parameter to the servlet. The value of the property is read from the siteproperties. Do not use together with channel.
Whereas the uuid
parameter is set by the indexer, all other parameters can be set for project specific adjustments. In this case a JSP template has to be called instead of invoking the servlet directly. This JSP sets the parameters and then redirects the call to the serlvet. See the following example for a JSP template which automatically sets the file suffix based on the mimetype of the binary data of a document.
indexer.jsp
<%@ page session="false" pageEncoding="utf-8" contentType="text/html; charset=UTF-8"%>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<%@ taglib tagdir="/WEB-INF/tags/sophora-commons" prefix="sc"%>
<%@ taglib uri="http://www.subshell.com/sophora/jsp" prefix="sophora" %>
<c:if test="${not empty param.uuid}">
<sophora:getDocument var="document" uuid="${param.uuid}" />
<%-- Suffix herausfinden und setzen --%>
<c:choose>
<c:when test="${not empty document.binarydata}">
<c:set var="mimetype" value="${document.binarydata.mimeType}" />
<c:if test="${not empty mimetype}">
<sophora:getSuffixByMimeType var="suffix" mimeType="${mimetype}" />
</c:if>
</c:when>
<c:when test="${document['jcr:primaryType'] eq 'sophora-extension-nt:image'}">
<c:set var="suffix">jpeg</c:set>
</c:when>
</c:choose>
</c:if>
<c:set var="redirectUrl" >/system/servlet/urlService.servlet</c:set>
<c:redirect url="${redirectUrl}">
<c:param name="uuid" value="${param.uuid}" />
<c:param name="suffix" value="${suffix}" />
<c:param name="type" value="${param.type}" />
</c:redirect>
If no URL is configured an error is returned. This is necessary due to the fact, that the generated URL always has to point to the live version (with the live domain) of the passed document.
Update Queue Database
The update queue database is a prioritized queue (like the update queue in the delivery). The queue contains the UUIDs of documents which have been sent to the index.
Actions like structure node changes are inserted to the queue with a lower priority than the processing of a single operation, for instance, changes to an individual document. This is because structure node changes effect all documents that are located at the node at hand and thus might take longer. Document set offline and document deleted events are not added to the update queue at all, because these actions are sent directly to the index.
In general, the queue does not contain large numbers of UUIDs, except for structure node changes or indexer is synchronizing.
To backup the indexer the update queue database is irrelevant. When you (re)start the indexer the sophora.replication.restartDate
property should be set to a date which will ensure that all changes are synchronized since stoppage.
Indexer Plugin Mechanism
The Sophora Indexer uses a plugin mechanism to be able to connect to different search engines or to other external systems. Since version 1.33.1 the indexer itself only contains a dummy implementation which creates and writes to a log file. So in order to use the indexer to work with a specific search engine, a plugin has to be added and configured.
Add and configure an Indexer Plugin
First of all the library of the corresponding plugin has to be added to the indexer's plugins
folder. This plugins
folder must be created within the directory, where the config
folder is located. Usually the plugin consists of one jar
file, which contains the java code, necessary configuration files and all dependencies. Second of all the added plugin must be configured in the indexer's properties file. To achieve this you have to add the plugin specific configuration properties to the file. These properties are listed in the plugin's documentation. In addition you have to add the name of the plugin's bean which is defined in the indexerExtension.xml
file of the plugin. The name has to be set to the property sophora.searchEngine.connection
. When you restart the indexer the plugin will be used.
Please note that it is possible to use only one plugin at a time.
Create your own Plugin
To integrate another search engine or external system you have to set up a corresponding Java project. There, you have to put the library com.subshell.sophora.indexer.api
into the build path.
Two classes are required that implement the interfaces com.subshell.sophora.indexer.api.ISearchEngine
and com.subshell.sophora.indexer.api.ISearchEngineFactory
respectively. The implementation of ISearchEngineFactory
provides the method getEngine
which returns an object of ISearchEngine
. All necessary instantiations can be done here as well.
Spring Configuration
Within the build path of the newly created project there must be a directory called "spring". This folder needs to contain a XML
file "indexerExtension.xml
" (the names of the folder and the file must not be altered). This configuration file comprises a Spring bean definition that is an instance of com.subshell.sophora.indexer.api.ISearchEngineFactory
, like the following example:
indexerExtension.xml
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd">
<!-- enable annotation driven dependency injection -->
<context:component-scan base-package="com.subshell.sophora.indexer.myindexerplugin"/>
<bean id="myConnection" lazy-init="true" class="com.subshell.sophora.indexer.myindexerplugin.MyConnection">
<property name="searchEngine" ref="mySearchEngine" />
</bean>
</beans>
In this example only the bean defining an instance of ISearchEngineFactory
is included. All other beans (like mySearchEngine
) and their dependecies are configured using annotation driven dependency injection. The following code snippets show the MyConnection
class, which makes use of Spring's dependency injection and an excerpt of the MySearchEngine
class which is instantiated automatically using annotations.
MyConnection.java
public class MyConnection implements ISearchEngineFactory {
private MySearchEngine mySearchEngine;
@Override
public ISearchEngine getSearchEngine(ISophoraClient client) {
return mySearchEngine;
}
public void setSearchEngine(MySearchEngine mySearchEngine) {
this.mySearchEngine = mySearchEngine;
}
}
MySearchEngine.java
@Component
@Qualifier("mySearchEngine")
public class MySearchEngine implements ISearchEngine {
[...]
}
}In addition to the interfaces ISearchEngine and ISearchEngineFactory
the API provides an interface called ISiteIndexKeyProvider
. A class implementing this interface is instantiated on startup and can be used within all plugins to retrieve the configured index keys for specified sites. You can get an instance of this class by using Spring's dependency injection. The name of the bean to inject is siteIndexKeyProvider.
Build your Plugin
Plugins should be build with and managed by Maven. Therefore the Indexer-API must be added as a dependency to the Maven project. The API itself brings some dependend libraries into the project, like Spring, Apache Commons, Sophora Client etc. On the one hand this makes it is easy to use those libraries for your own plugin, but on the other hand you have to be careful not to create conflicts when adding new dependencies.
Due to the fact that plugins should only consist of one jar
file, it is recommended to use the Maven assembly plugin for building.
mvn package assembly:single
JMX Connection
To set up a JMX connection use the following pattern:
service:jmx:rmi://<host>:<sophora.indexer.jmx.registry.port>/jndi/rmi://<host>:<sophora.indexer.rmi.registry.port>/server
An example:
service:jmx:rmi://localhost:5030/jndi/rmi://localhost:5031/server
Username and password are read from the sophora.properties
file (sophora.indexer.jmx.registry.username and sophora.indexer.jmx.registry.password
), if configured.
The indexer provides the following operations:
MBean | Operation | Description |
---|---|---|
Indexer | updateDocument(uuid) | Indexes the document with the given UUID |
UpdateQueue | getSize() | How many documents are enqueued |
UpdateQueue | getSizeByPriority(priority) | How many documents are enqueued with given priority |
UpdateQueue | getSizePriorityMap() | Returns a map where key=priority and the value=documents enqueued |
UpdateQueue | removeAllByPriority(priority) | Remove all documents with the given priority from the queue |