Document Store

Architecture Overview

A technical overview over the most important parts of the Document Store.

This section focuses on the server-side implementation, particularly the database architecture. A look behind the scenes, if you will. Unlike in Sophora 5, Sophora documents are now read directly from the database without an intermediate layer (JCR). However, the data should not be altered by external systems and is only to be read and written by the Sophora Server. 

The Document Store is implemented close to the database layer. In simple terms, it is responsible only for reading and writing documents. It does not modify the documents itself but stores them exactly as they are passed to it. The layers above handle business logic and event handling.

From a user perspective, these changes were made with API compatibility in mind. The interaction with the Sophora Server via the Sophora Client mostly remains unchanged. For more information, see here.

PostgreSQL Tables

Most parts of a Sophora document are stored as JSON. Each root and child node is represented by its own database record. To ensure high performance for reading and writing, only selected system properties (essential for search) are defined as separate, indexed columns. Efficient searching across project-specific properties is intentionally avoided in favor of faster write performance. For specific use cases, we also introduced multiple other tables such as "proposals" or "channel_settings". They are used to improve internal search operations and overall management of the data.

In JCR the UUID was the primary key that identified a Sophora document. In PostgreSQL, the External ID is always the leading identifier and serves as the primary key. This also applies to references.

We store Sophora documents in two main workspaces: working and live. These workspaces are also represented by separate tables. You can recognize the tables for live versions by the prefix "live_".

Transaction Model

A Sophora document is represented by multiple tables in PostgreSQL. When saving a Sophora document, the Sophora Server writes to these tables within the same transaction. The only exception is the creation of versions in the Version Store, which is handled asynchronously. The Sophora server request does not wait for this process to complete. However, reading a version will always return the latest data, even if the version has not been persisted yet. 

During storage, the document is split into parts:

  • Child nodes (and the root node) are stored in one table.
  • Related information such as references and channel settings is stored in additional tables to enable efficient searching.

When saving a document, the old state is deleted first, and then the new state is written, all within the same transaction (including writing to both the Live and Working workspace).

Last modified on 5/23/23

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon