Delivery Framework | Version 3

Delivery Architecture

The Sophora Delivery Framework coordinates the delivery of and provides access to the managed content

Archived documentation for Sophora 3. End-of-support date for this version: 7/25/21

Documentation for Sophora 4

The Sophora Delivery coordinates the delivery of and provides access to the managed content.The following figure displays the processes relevant for the delivery:

+++
+++ (Image: subshell/CC BY)

The Apache HTTP Server reveives the HTTP requests to the Sophora content. First they check the cache for pages that are already generated. If no fragments are found the request is forwarded to the tomcat containing the delivery webapplication. This is usually done using mod_proxy. The generated content will be returned to the browser and will be stored in the cache store. So it is accessible for the Apache directly during the next request.

Content Generation

There are two ways to generate content in the delivery. Within a website these content generation techniques can be combined:

  1. Pro-active Pre-Generation of cached content in the docroot of the Apache HTTP Server
  2. On-demand generation of cached content in the docroot of the Apache HTTP Server

Pro-active Pre-Generation

By pre-generating HTML and image files the delivery's performance can be enhanced significantly. The cache entries are generated in the background so that a dynamic website can be delivered with a likewise perfomance of a static site.

On-Demand Generation

For content that isn't requested very often, like a site's archive, on-demand generation might be sufficient because not every content needs to be kept in the cache. With on-demand generation
the according delivery formats are created as soon as they are requested by a user. When created those files are written to the webserver's docroot so that later requests don't require another on-demand generation.

The Sophora lifecycle

The delivery of content starts with a request, usually a HTTP request and ends in a rendered resource like HTML-Pages, images, CSS stylesheets, scripts or a JSON data structure. The following figure shows the components involved with the sophora lifecyle and describes their tasks in this process.

+++
+++ (Image: subshell/CC BY)

The HTTP request is sent to a webserver. In a live environment an Apache HTTP server will usually be used for performance reasons (a tomcat servlet engine provides the same functionality and is sufficient in a development environment). The HTTP server checks the cache store for an already generated version of the requested content. If a matching fragment is found, this fragment will be returned. Otherwhise the request is forwarded to the servlet engine, containing the portal application (delivery). This is usually done via mod_proxy over HTTP.

Before any template is called the request passes the sophora-filter-chain. The filters interpret the request, initialize and provide some business objects like the cache-facade or the contentProvider, load the corresponding Sophora content and find the template that should be used to render the document.

The delivery distinguishes between a request to a resource or to Sophora content. A resource is a file that is present in the webapplication like static .html-files, images, .css-Files or scripts. If such a resource is found in the app-base of the webapplication the reource will be returned. If a jsp with the corresponding name is found, the result of this jsp will be returned.

If no matching resource is found, the request-URL will be interpreted according to the Sophora URL pattern. Depending on the node-type the request is forwarded to a template that should be used to render the current content. The mapping between the node-type and the template is defined in the template.xml. The result of the template, e.g. a rendered html page will be stored as a fragment in the cache store using the cache facade. The dependency between the request URI, the mapped template, the required resources to render the fragment will be stored in the cache db.

The generated fragment may include SSI-commands that will lead to subrequests in which the SSI-commands will be resolved. The fragments will be put together to a page and sent back in the HTTP response. In a live environment this is done by the Apache HTTP Server using mod_include. In a development environment this will usually be done by the servlet engine via the Sophora-SSI-Filter.

Descriptive URLs

The website's content is delivered using descriptive URLs.

Example: www.tagesschau.de/inland/gesundheit1234.html

A document's URL consists of the its path (derived from the location in the structure tree) and the Sophora ID (the ID stem together with a counter). Keep in mind that in some cases additional paramters are required.

Sophora URL pattern

Sophora URLs are used to request Sophora Content through the delivery Application, usually over http. The URLs contain several information that are read by the sophora filterchain and translated into implicit objects that can be used in the templates. For example the URL http://demo.sophoracms.com/trendcities/copenhagen/cphvision/cphimpressions100~small_v-teaser.jpg will be split into the following parts:

  • demo.sophoracms.com: the site where the requested document should be rendered
  • /trendcities/copenhagen/cphvision/: the complete structure-path unter which the template should be rendered (this path may differ from the path under which the document is located in the repository)
  • cphimpressions100: the Sophora id of the requested document
  • ~small: the template type that will be used. The template-type can be defined in the templates.xml.
  • _v-teaser: additional parameters that are needed to render the template. In this case a parameter with the name 'v' and the value 'teaser' will be accessible in the templates.
  • .jpg: the suffix (needed to determine the mime-type of the requested content)

The described URL is a URL to a mounted system. This is the typical setup in a live environment. The Request addresses an Apache HTTP Server that will forward it to the Tomcat where the web-application is deployed. Thereby the URL will be rewritten. The domain or hostname will be changed into the hostname where the tomcat is installed and the webapplication's context path and the site name will be added to the URL. The rewritten URL might then look like:

http://tomcat.demo.sophoracms.com:8080/sophora-demosite-live/demosite/trendcities/copenhagen/cphvision/cphimpressions100~small_v-teaser.jpg

where:

  • tomcat.demo.sophoracms.com:8080 is the hostname and port under which tomcat is accessible
  • sophora-demosite-live: is the name of the webapplication context
  • demosite: is the name of the requested site according to the name of the site in the repository

The URL codec

Sophora provides one codec to create and interpret URLs: The DefaultSophoraCodec. If this codec does not match your needs, you can create your own URL codec. See section templates.xml for details.

DefaultSophoraCodec

The DefaultSophoraCodec allows digits in the id stem. The SophoraId and the templateType are separated by ~. Urls that are generated with the defaultUrlCodec have the following structure:

  • mounted:
http://[HOSTNAME]/[STRUCTURE_PATH]/[SOPHORA_ID]~[TEMPLATE_TYPE]_[PARAMETER_NAME]-[PARAMETER_VALUE]....[SUFFIX]?[QUERY_PARAMETER]
  • not mounted:
http://[HOSTNAME][:PORT_NUMBER]/[CONTEXT]/[SITE]~[CHANNEL]/[STRUCTURE_PATH]/[SOPHORA_ID]~[TEMPLATE_TYPE]_[PARAMETER_NAME]-[PARAMETER_VALUE]....[SUFFIX]?[QUERY_PARAMETER]

Validation of URL Parameters via Checksum

To add a validation mechanism for URL parameters the property sophora.delivery.urlCodec.validateUrlParams can be set to true. If this property is set, the Sophora Taglib checks whether there are any parameters when creating an URL. If so, the delivery creates the checksum of all URL parameters and appends it to the designated URL. For example an URL like /node/file100~parameter-1.html results in /node/file100~_parameter-1-9f727126373e44a2fd0ed552d62d7f6d296430f3.html.

If a request is sent to the Tomcat server and the property sophora.delivery.urlCodec.validateUrlParams is true, it is checked whether the checksum matches the specified URL parameter. If the check fails, a 404 error is returned. Without URL parameters in the created URL, no checksum is created.

This validation mechanism provides some additional security, because URLs can neither be changed manually nor can new parameters be added to an URL without knowing the correct checksum.

The Sophora ID Ignore List

Calling an URL which contains a Sophora ID may result in a "404 - Document not Found" error, if no document can be found for the desired ID. In this case the delivery saves this ID, in order to avoid further requests to the Sophora server for this particular ID. This way up to 1000 (If nothing else is configured) Sophora IDs, which caused this error, are saved.

If this happens, you can see the following entry in the delivery.log file:

Adding sophoraId 'test102' to ignoreList (current size: 1)

Or if the ID is removed again:

Removing sophoraId 'test102' from ignoreList (current size: 0)

Document changes are updating the ignore list, so that e.g. Sophora IDs of newly created documents are removed from the ignore list.

JMX can be used for checking the size and Sophora IDs situated in the ignore list at a specific point of time (see MBean SophoraIdIgnoreList). In addition it is possibile to remove any or all Sophora IDs, and setting the cacheSize via JMX.

The Sophora Filter Chain

Each request is filtered by several servlet filters within the web application. Depending on the configuration, filters may be omitted or disabled. The Sophora-Delivery provides a set of filters that is required in order to render Sophora content. Additionally some optional filters can be used. You can find a description of these filters in the section Additional (optional) Filters.

More information about the configuration of the filters can be found in the section Filters

Standard Filters

SiteEnvironmentSetupFilter

Aggregates the configuration of the site which is currently requested.

SSIFilter

This filter is active, if the Tomcat is not running inside a webserver. The Tomcat itself needs to be configured accordingly (see Configuration Parameters). This filter triggers SSI statements using an instance of the HttpClient. By default the filter only reacts on responses whose content type starts with "text" or "application/xml". This behaviour can be altered by changing the property contentTypePrefixes in the web.xml file of this filter. Here, you can enter several content type prefixes, separated by comma.

CacheFilter

If caching is active, this filter checks whether the requested content has already been generated and is available in the cache store. In this case, the according content will be returned (and remaining filters are omitted). If the content has not been generated, the remaining filters are applied. The parameter encoding defines the encoding of the response. If this parameter hasn't been set, Tomcat's default encoding is used. The encoding of the delivered HTML file will not be changed by the cache filter, because this only affects the header.

SophoraSessionFilter

Generates the content provider and puts it into the request scope. The content provider can be accessed using the DeliveryUtils and can be used to access the server via the Client-API.

UrlDecoderFilter

Parses the URL and executes the shortcut mechanism. This fiter also sets the values of the defaultContentMap and currentStructureNodeUuid of the request. By default every URL that is handed to this filter is parsed. It is possible to exclude URLs with certain suffixes from being intrepreted. This is usually done for logical URL mappings e.g. for servlets that do neither have a corresponding resource in the context root of the webapplication nor a corresponding document in the repository.

Additional (optional) Filters

SetUtf8EncodingFilter

Automatically sets the character encoding to UTF-8 for every request.

BenchmarkFilter

Writes the generation time of each template into the logfile.

MultipartFilter

Handles the upload of multipart form data, usually a file upload. Enables definition of a maximum file size for uploads. When using this filter, uploaded files are available through the request variable com.subshell.sophora.formFileItems.

ExceptionLoggingFilter

Writes all exceptions that occur during processing of requests to the logfile.

TrimFilter:

This filter removes all empty lines from the output and trims all other lines. It is applied to all servlet responses with the mime type text.
Additionally, it is applied to all application mime types with the subtype xml or with subtypes beginning with xml+, xhtml+, rss+ or rdf+. You can alter this default behavior by setting the filter params applicationSubtypes and applicationSubtypePrefixes. The specific types must be separated by comma, for example xml,json.
By default the filter will not trim lines between <pre ...> and </pre> or <textarea ...> and </textarea>. This behaviour can be overridden by setting the param ignoredTags and passing all tags to be ignored seperated by comma (default: pre,textarea).

For detailed information about configuration of the filters, please refer to the configuration section.

System Requirements

The Sophora Delivery performs many I/O operations on the hard disk. The cache queue and HTML fragments are stored in the file system, the dependencies of Sophora documents to the HTML fragments are stored in the cache database and are kept up-to-date, and so on. Therefore the underlying system requires fast I/O components, e.g. hard disks/RAID with (enabled) write-cache, to prevent the I/O to slow down or even block the delivery.

Last modified on 3/5/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon