Offline Document Indexer Guide

The offline document indexer provides a separate data base for providing smart redirects when clients request documents which are no longer online.

The offline document indexer is a server plugin that - if enabled - fills a separate Solr core with information about all existing documents that are offline. Based on this Solr core's information, the delivery component might provide redirects for links to documents that had been online before.

Sophora Primary Server configuration

The offline document indexer plugin relies on some configuration properties which must be set on the Sophora Primary Server. These properties are listed below.

Property name	Explanation
offlineDocumentIndexer.active	Enables the feature. The default is false
offlineDocumentIndexer.propertyNames	May contain a comma separated list of additional document properties to be available on the Solr documents. See below section for Solr fields for more details
offlineDocumentIndexer.nodetypeNames	May contain a comma separated list of node types. If set only documents of the given types will be considered in the offline indexing process.
offlineDocumentIndexer.coreName	Names the Solr core to use for the offline documents. The default is offline
sophora.solr.username	The username for accessing Solr. This parameter is optional
sophora.solr.password	The password for the Solr access. This parameter is optional

Solr fields

By default, the Solr documents representing offline documents have the following fields

Solr field	Content
channel_names_ss	The names of the channels this document was enabled for.
channel_uuids_ss	The UUIDs of the channels this document was enabled for.
id	The UUID of the document
primaryType_s	The document's primary type
sophora_structureNode_s	The UUID of the structure node of the document. If there was a recent live version, then this versions structure node will be used. The current version's structure node is used elsewise
sophora_modificationDate_dt	The milliseconds time stamp of the documents property sophora:modificationDate
sophora_id_s	The documents readable ID from the document property sophora:id
sophora_idHistory_ss	Contains all readable IDs the document ever had. This property is multivalued
sophora_cronNextOnDate_dt	If present, the date when the document will be published again. See "Cron Server Feature" for details.

An additional list of properties might be passed through the offlineDocumentIndexer.propertyNames property. For all of these properties a Solr field matching the field naming conventions will be added. A Solr field name consists of the actual property name whereas ":" is replaced by "_" plus a suffix indicating type and multivalue status.
The sophora property sophora-content:title would result in the Solr field sophora-content_title_t.

Sophora Replica Server configuration

The offline document indexer does not require a specific configuration for the Sophora Replica Servers to be enabled. However the Sophora Replica Server needs to properly propagate its hostname to the Sophora Primary Server so that the Sophora Primary Server can actually access the Sophora Replica Servers Solr. The Sophora Replica Server will try to determine its own hostname and propagate this as part of its ServerInfo-Object. Commonly this determined hostname is not fully qualified and therefore the Sophora Primary Server might have trouble reaching the Sophora Replica Server. In order to correct this you can explicitly set a fully qualified host name for a Sophora Replica Server.

Property name	Explanation
sophora.replication.slaveHostname	Sets the fully qualified hostname for this Sophora server. This property should also be set on Sophora Primary Servers in order to be prepared for switching the Sophora Primary Server.

Delivery example

One way to use the offline core are redirects. For this purpose you have to create a class that implements the interface IRedirectBuilder.
The class might look like this:

public class ProjectRedirectBuilder implements IRedirectBuilder {

	private static final String OFFLINE_CORE_NAME = "offline";
	private static final String REDIRECT_DOCUMENT_S = "redirectDocument_s";
	private static final String SOPHORA_STRUCTURE_NODE_S = "sophora_structureNode_s";

	private static final Logger log = LoggerFactory.getLogger(ProjectRedirectBuilder.class);

	private final String[] fields = { "*" };

	@Override
	public String createRedirectUrl(SophoraUrl url, IContentMapContext context) {
		if (url == null || StringUtils.isBlank(url.getSophoraId())) {
			return null;
		}

		String sophoraId = url.getSophoraId();
			
		SolrQuery solrQuery = new SolrQuery();
		solrQuery.setQuery("sophora_id_s:" + sophoraId);
		solrQuery.setFields(fields);
		solrQuery.setRows(1);
		solrQuery.setStart(0);
		solrQuery.addSort("score", ORDER.desc);

		SolrResult solrResult = SolrClient.query(OFFLINE_CORE_NAME, solrQuery);

		String redirectDocumentUuid = null;

		if (solrResult != null && !solrResult.getEntries().isEmpty()) {
			SolrDocument solrDocument = solrResult.getEntries().get(0);

			if (solrDocument.containsKey(REDIRECT_DOCUMENT_S)) {
				// Ein RedirectDocument ist vorhanden
				try {
					String redirectDocUuid = (String) solrDocument.getFieldValue(REDIRECT_DOCUMENT_S);
					context.getDocumentByUuid(redirectDocUuid);
					return createRedirectUrlForUuid(context, redirectDocUuid);
				} catch (ItemNotFoundException e) {
					log.debug(e.getMessage(), e);
				}
			}

			// Strukturknotenhierarchie nach default Dokumenten durchgehen
			String structureNodeUuid = (String) solrDocument.getFieldValue(SOPHORA_STRUCTURE_NODE_S);
			StructureInfo origStructureInfo = context.getStructureInfo(UUID.fromString(structureNodeUuid));
			List<UUID> structureNodeHierarchy = origStructureInfo.getStructureNodeHierarchy();
			List<UUID> reverseStructureNodeHierarchy = Lists.reverse(structureNodeHierarchy);
			for (UUID snUuid : reverseStructureNodeHierarchy) {
				StructureInfo structureInfo = context.getStructureInfo(snUuid);
				UUID defaultDocumentUUID = structureInfo.getDefaultDocumentUUID();
				if (defaultDocumentUUID != null) {
					redirectDocumentUuid = defaultDocumentUUID.toString();
					break;
				}
			}

			return new RedirectResult(createRedirectUrlForUuid(context, redirectDocumentUuid));
		}

		return null;
	}

	private String createRedirectUrlForUuid(IContentMapContext context, String redirectDocumentUuid) {
		String redirectUrl = null;
		if (StringUtils.isNotBlank(redirectDocumentUuid)) {
			redirectUrl = context.createUrl(false, false, true, (String) null, redirectDocumentUuid, (String) null, (String) null, new HashMap<String, Object>(), new HashMap<String, Object>(), null, null);
			redirectUrl = "/" + StringUtils.substringAfter(redirectUrl.replaceFirst("/", ""), "/");
			log.debug("Creating redirect url {} for document uuid {}", redirectUrl, redirectDocumentUuid);
		}
		return redirectUrl;
	}
}

Technical details

Though the offline documents core is available on solrs for Sophora Primary Server and Sophora Replica Servers (either replication and staging), all of the indexing processes are done by the Sophora Primary Server.
Updating a Sophora Replica Server is not part of the regular synchronization process. When a new Sophora Replica Server connects then it will get updated by the Sophora Primary Server just by comparing the current offline documents core with the overall list of offline documents.
The offline documents index will not contain any documents that are deleted.

Controlling the OfflineIndexer through JMX

The Sophora Primary Server will provide a specific JMX-Bean if the OfflineIndexer is activated. This bean is com.subshell.sophora.server.plugins/OfflineDocumentIndexer and will provide a list of indexers where there is one indexer for each server including the Sophora Primary Server. Each indexer comes with these properties:

Indexer JMX Properties
Name	Description
host	Holds the Solr-Base URL for this indexer, e.g. http://stage01.mycompany.com:1196/solr
id	The ID of the Indexer which is also the ID of its corresponding Sophora server.
state	Can be one of inactive: The indexer has been explicitly switched off or can not reach the host running: The indexer currently is indexing elements from a queue ready: The indexer has been started and is listening for events
currentWorkToDo	The number of documents in the queue (in case the state is running).

This bean also offers operations on these indexers. They all take the ID of the indexer as input.

deactiveIndexer: Explicitly deactivates a single indexer
activateIndexer: Activates a deactivated indexer
fullRebuild: Rebuilds the index from scratch. You might want to use this if you have configured new properties for your offline index and want them to be present on all indexed documents. We do not recommend to trigger a rebuild of all indexers at the same time.
rebuildSince: Rebuild the index considering all the documents modified since a given timestamp. You might want to use this if the last full rebuild is for some reason interrupted. The format of this timestamp is dd.MM.yyyy HH:mm

The JMX operation 'rebuildSince' requires a Sophora Server in version 4.1.4 or newer.

If an indexer is activated (either due to the JMX operation or due to its corresponding Sophora server reconnecting to the Sophora Primary Server after having been disonnected for some time) then it will automatically fill its queue with all the IDs of documents that have been set offline since the latest modification to offline index of this server.

A full rebuild therefore is only performed if the offline core on the server is completely empty or the JMX operation "fullRebuild" has been triggered.

Setting an indexer to inactive is not persistent. If you restart the Sophora Primary Server or switch the master role to a former Sophora Replica Server then the inactive indexer will switch back to running.

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.