Replication and Synchronization

Introduction

This document explains how documents are shared between different servers in a single cluster. This is done using two different mechanics.

Replication is the ongoing transfer of changes from the Sophora Primary Server to all connected Sophora Replica Server and Sophora Staging Servers as long as all affected systems are running.

Synchronization is a process that is used when a Replica or Staging Server has lost its connection to the primary or has been shut down for some time. The synchronization will then transfer all the document changes from the Primary to the Replica or Staging that have been missed in replication while the connection has been down. Once a synchronization is finished, the Replica/Staging will switch to regular replication.

Both replication and synchronization are based on the JMS implementation ActiveMQ.

Please refer to the Sophora Server documentation before you read this documentation. Particularly the sections about the configuration and the different server modes are important. The documentation about Backup and Recovery might also be useful.

The Sophora Replica and Staging Servers are connected to the Primary. Any number of Replicas and Staging Servers can be used.

Replication

The replication between the Sophora Primary Server and a Sophora Replica Server is done via ActiveMQ topics. The Primary manages two topics (one for replicas and one for stagings). Every connected Replica or Staging will subscribe onto the topic corresponding to its server type. Every editorial change therefore adds a message to the replication topic and every change to published content furthermore adds a message to the staging topic.

Such a replication message is deleted when all consumers subscribed to its topic have read this message. Since every change on the Primary is performed on all Sophora Replicas as well, Replicas are exact copies of the Primary.

The replication basically covers everything you can edit in the DeskClient, e.g:

Documents
Document versions
Structure nodes
Users
Configuration documents
Locks
Complete deletions

Replication between Primary and Replicas

Configuring replication between Primary and Replica

Let's assume a scenario with two Replica Servers (called Replica A and Replica B). The following configuration is necessary to configure the replication.

Excerpt of the configuration of the Primary

sophora.replication.mode=master
sophora.local.jmsbroker.host=primary.hostname
sophora.local.jmsbroker.port=1197

Excerpt of the configuration of Replica A

sophora.replication.mode=slave
sophora.remote.jmsbroker.host=primary.hostname
sophora.remote.jmsbroker.port=1197

Excerpt of the configuration of Replica B

sophora.replication.mode=slave
sophora.remote.jmsbroker.host=primary.hostname
sophora.remote.jmsbroker.port=1197

Replication between Primary and Staging Servers

Configuring replication between Primary and Staging Server

Excerpt of the Configuration of the Sophora Staging Server

sophora.replication.mode=stagingslave
sophora.remote.jmsbroker.host=primary.hostname
sophora.remote.jmsbroker.port=1197

Monitoring topics

Most information on the state of topics and queues is held by the JMS Broker. The Primary is its own broker and provides the JMX-Bean org.apache.activemq/Broker/embedded which offers a lot of detailed information.
The most important indicator for the replication is the size of the replication topic. The amount of not-yet consumed JMS-Messages (including Topic and Replication Queue) per server also is provided by the JMX-Bean com.subshell.sophora.server/Server/TYPE/NAME/QueueSize. This metric is also tracked as "Replication Queue" by the Sophora Admin Dashboard.

Synchronization

Whenever a Replica or Stage has been offline or down it may have missed messages from its corresponding replication topic. In order to close the data gap it therefore sends a synchronization request with some key data (mostly different timestamps) to the Primary.

The Primary then creates a temporary synchronization queue, determines all changes the Replica/Stage is missing and sends them via this queue.

This process consists of various steps (mostly sending system documents, structure nodes and regular documents) and is tracked with progress logs in the log files of both involved servers.

The duration highly depends on the amount of data the Replica or Stage is behind. It only takes a few minutes, if the recipient has just been restarted, but might take several days if you synchronize a Replica or Stage from scratch.

Switching from Synchronization to Replication

In most cases Replicas and Stages will subscribe to their corresponding topic during the sync. They'll start consuming messages from this topic once their synchronization is done. This might cause the topic to have a lot of pending messages which must be held in memory by the primary.

To prevent this heap flooding, the Replica/Stage will not register for the topic straight away if it expects the synchronization to last longer than a couple of minutes (by checking its own most recent document modification date). Upon sync completion the Replica/Stage will then trigger a delta-synchronization to cover up the document changes made during the first synchronization. Such a delta synchronization works the same as a regular synchronization.

Enforcing synchronization since a specific date

A server sending a synchronization request to its primary will try to fill in the required time stamps to the best of its knowledge. After an incident however the Ops-Team might know better. You can enforce synchronizing all documents modified since a given time stamp by

Configuring sophora.replication.restartDate in the sophora.properties of the Replica/Stage OR
Editing the to-be-sent synchronization request (a JSON-File) in the Replicas/Stages file system under SOPHORA_INSTALLATION/repository/syncRequest

Both modifications have to be done before starting the Sophora Server.

Version limit on synchronizing Replicas

A synchronization will cover all documents a Replica requests but not always all versions of all these documents. A synchronization will send at most 10000 versions per document per sync. Only document versions newer then the document time stamp from the sync request are considered for synchronization (since the Replica will already have older ones). Therefore it is unlikely that versions will be ignored if a Replica just has been down for maintenance for a brief amount of time.

However hitting the 10000 versions limit is possible when synchronizing a Replica from scratch. If you really need more than 10000 versions per document (or if you are fine with having less) you can adjust this limit by setting the property sophora.replication.maxVersionsSyncCnt in the sophora.properties of the Primary.

Note: Staging servers generally don't have any document versions except for the last published one. Therefore they are not affected by this version limit.

ServerInfo Ping

An additionaly permanent JMS queue is kept by the broker (the JVM of the Primary). In this queue the Primary is the consumer and all Replicas and Stages are producers. Replicas and Stages use this to send a regular ping containing their ServerInfo-Object. Furthermore this is used to send sync requests.

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.