With Sophora 4, we have renovated the Sophora Importer. It is now based on Spring Boot, which we use for all our tools developed in recent years. This brings a few new features, e.g. metrics export in Prometheus format (standard Spring Boot metrics so far), as well as making future enhancements easier.
With this change, the configuration format of the Importer changes. Once you have converted your configuration to the new format, existing imports using the watch folder or the web service should work without modification.
Configuration Files
The new Importer uses the following configuration files:
application.yml
: This is the main configuration file which replaces the sophora-importer.properties and the sophora-importer_instance-X.properties.sophora-importer-<version>.conf
(optional): Used to set the JVM properties (JAVA_OPTS) such as heap size.loader.properties
(optional): Used for adding the contents of a folder to the classpath of the Importer.logback-spring.xml
: Logging configuration.
Deployment
Spring Boot expects to find the configuration files next to the jar file, so we recommend to put the Importer application jar and the configuration files into the same directory. The name of the sophora-importer.conf file must exactly match the name of the jar file, i.e., if the name of the jar file includes the version number, the name of the conf file must include the version number as well.
Recommended directory structure:
/cms
/sophora-importer
/additionalLibs
/groovy
/logs
application.yml
sophora-importer-4.0.0.conf
sophora-importer-4.0.0.jar
loader.properties
logback-spring.xml
Our Maven repository contains two files suitable for deploying the Importer:
com.subshell.sophora.importer-<VERSION>-executable.jar
com.subshell.sophora.importer-4.0.0-SNAPSHOT-bin.tar.gz
The executable jar is basically the Importer application without any configuration files. It is suitable when deploying the Importer using Ansible, Puppet or similar configuration management tools. The bin.tar.gz
contains the executable jar as well as sample configuration files. Use this for manual deployments to get started quickly.
Starting and Stopping
The Importer jar file is a Spring Boot executable jar. On Linux and MacOS, the Importer can be started by running the executable jar as follows:
./sophora-importer-<version>.jar run
For running the Importer as a background daemon, use the options start
and stop
.
More options are documented in the Spring Boot documentation.
sophora.importer.additionalClasspath
With the Sophora Importer 3, it was possible to specifiy a directory which should be added to the classpath of the Importer using the configuration property sophora.importer.additionalClasspath
. This must now be done with the following entry in the file loader.properties
:
# Loads resources (.class files etc.) from nested jar files in directories.
# Should contain comma-separated list of directories, archives, or directories within archives
# (e.g. lib,${HOME}/app/lib, earlier entries take precedence).
loader.path=additionalLibs
JAVA_OPTS
Options for the Java VM, such as heap size, can be set using the environment variable JAVA_OPTS
or using an entry in the sophora-importer.conf
file. For example:
JAVA_OPTS="-Xmx1G"
Management-Endpoints / Actuators
The Importer exposes a few HTTP endpoints for management and metrics. These are available at the same HTTP port as the SOAP web service. Access to the management endpoints is also using the authentication settings for the web service. Notable endpoints are:
- /actuator/health
- /actuator/jolokia
- /actuator/prometheus
- /actuator/sophora-server
Note that the jolokia and health endpoints have moved from /health
and /jolokia
to /actuator/health
and /actuator/jolokia
respectively.
The Importer will now open the HTTP port even if the web service is disabled.
Configuration Format
The configuration files sophora-importer.properties
and sophora-importer_instance-X.properties
have been replaced with the single configuration file application.yml
. See below for an example file and the mapping of old configuration options to the new format.
Instance Keys
The importer instances now each have a key (string) instead of an index number. In previous versions, SOAP imports used the instance index to select the instance to import into. In the new version, the instance is selected using the key. The instanceIndex
in the SOAP XML can now be a string and refers to the instance key.
When converting an older configuration file to the new format, we recommend using the old instance index as the key, so that existing SOAP imports continue to work. For new instances configured in the future, a descriptive key is recommended.
Binary Properties
With Sophora 3, configuration of binary property names was done using the file binary-property-names.xml
. With Sophora 4, this configuration has moved into the application.yml
:
# Map of binary properties to mimetype properties.
# Every binary property entry maps a binary property to the name of a corresponding mimetype. While importing, those
# properties whose name matches one of the "binaryProperty" entries, are interpreted as binary data. At the same time,
# there must be another property on the same level of the Sophora-XML that matches the value of the corresponding
# "binaryProperty" entry.
binaryProperties:
'sophora-content:binarydata': 'sophora:mimetype'
Webservice Users
With Sophora 3, usernames and passwords for access to the web service were configured using the webservice_users.json
file. With Sophora 4, this configuration has moved into the application.yml
:
importer:
webService:
# Enables or disables the SOAP webservice interface.
enabled: true
# Enables basic authentication for the SOAP webservice interface.
authenticationRequired: true
# List of users for authentication.
logins:
admin: xxx
Mapping of Configuration Options
Importer 3 (sophora-importer.properties) | Importer 4 (application.yml) |
---|---|
sophora.client.dataDir | sophora.client.misc.data-dir |
sophora.contentmanager.connectRetries | sophora.client.server-connection.retries |
sophora.contentmanager.connectRetryInterval | sophora.client.server-connection.retry-interval |
sophora.contentmanager.documentCacheSize | sophora.client.cache.document-cache-elements-in-memory |
sophora.contentmanager.migrationMode | sophora.client.server-connection.use-migration-mode |
sophora.contentmanager.password | sophora.client.server-connection.password |
sophora.contentmanager.proxyHost | sophora.client.proxy.host |
sophora.contentmanager.proxyPassword | sophora.client.proxy.password |
sophora.contentmanager.proxyPort | sophora.client.proxy.port |
sophora.contentmanager.proxyUsername | sophora.client.proxy.username |
sophora.contentmanager.publishedDocumentCacheSize | sophora.client.cache.published-document-cache-elements-in-memory |
sophora.contentmanager.serviceUrl | sophora.client.server-connection.url |
sophora.contentmanager.username | sophora.client.server-connection.username |
sophora.importer.additionalClasspath | see Deployment |
sophora.importer.cleanupFolders.cron | importer.cleanupFoldersCron |
sophora.importer.cleanupFolders.failure.maxAge | importer.cleanupFoldersFailureMaxAge |
sophora.importer.cleanupFolders.successful.maxAge | importer.cleanupFoldersSuccessfulMaxAge |
sophora.importer.directory.failure | importer.folders.failure |
sophora.importer.directory.feedpolling.data | importer.folders.feedPollingData |
sophora.importer.directory.successful | importer.folders.success |
sophora.importer.directory.xsl | importer.folders.xsl |
sophora.importer.disableImport | importer.disabled |
sophora.importer.feedpolling.active | importer.feedPollingEnabled |
sophora.importer.fileaccess.basedir | importer.folders.fileAccessBase |
sophora.importer.filenames.addTimestamp | importer.filenamesAddTimestamp |
sophora.importer.httpSoTimeout | importer.httpSoTimeout |
sophora.importer.jolokia.port | Not available anymore. |
sophora.importer.keepTempfiles | importer.keepTempFiles |
sophora.importer.maximumImportsToKeep | importer.maximumImportsToKeep |
sophora.importer.minimumFailedImportsToKeep | importer.minimumFailedImportsToKeep |
sophora.importer.name | importer.name |
sophora.importer.preProcessing.className | importer.preprocessing.className |
sophora.importer.preProcessing.scriptfolder | importer.preprocessing.scriptFolder |
sophora.importer.proxy.host | importer.httpProxyHost |
sophora.importer.proxy.password | importer.httpProxyPassword |
sophora.importer.proxy.port | importer.httpProxyPort |
sophora.importer.proxy.user | importer.httpProxyLogin |
sophora.importer.spring.additionalBasePackages | importer.springAdditionalBasePackages |
sophora.importer.transformationMode | importer.transform |
sophora.importer.validate.documents | importer.validateDocuments |
sophora.importer.watchfolder.checkInterval | importer.folders.watchCheckInterval |
sophora.importer.watchfolder.includeSubfolder | importer.folders.watchRecursive |
sophora.importer.watchfolder.regex.filesToImport | importer.folders.watchFilesRegex |
sophora.importer.webservice.active | importer.webService.enabled |
sophora.importer.webservice.authentication.active | importer.webService.authenticationRequired |
sophora.importer.webservice.baseAddress | server.port server.address |
sophora.importer.webservice.defaultInstance | importer.webService.defaultInstance |
sophora.importer.xslTransformerFactory | importer.xslTransformerFactory |
sophora.jmx.password | importer.jmxPassword |
sophora.jmx.username | importer.jmxLogin |
sophora.rmi.registryPort | importer.rmiRegistryPort |
sophora.rmi.servicePort | importer.rmiServicePort |
Importer 3 (sophora-importer_instance-X.properties) | Importer 4 (application.yml) |
---|---|
sophora.importer.cleanupFolders.failure.maxAge | cleanupFoldersFailureMaxAge |
sophora.importer.cleanupFolders.successful.maxAge | cleanupFoldersSuccessfulMaxAge |
sophora.importer.defaultSite | defaultSite |
sophora.importer.defaultStructureNode | defaultStructureNode |
sophora.importer.directory.failure | folders.failure |
sophora.importer.directory.feedpolling.data | folders.feedPollingData |
sophora.importer.directory.successful | folders.success |
sophora.importer.directory.temp | folders.temp |
sophora.importer.directory.watchfolder | folders.watch |
sophora.importer.directory.xsl | folders.xsl |
sophora.importer.disableImport | disabled |
sophora.importer.fileaccess.basedir | folders.fileAccessBase |
sophora.importer.filenames.addTimestamp | filenamesAddTimestamp |
sophora.importer.instance.name | name |
sophora.importer.instance.webservice.enabled | webServiceEnabled |
sophora.importer.keepTempfiles | keepTempFiles |
sophora.importer.maximumImportsToKeep | maximumImportsToKeep |
sophora.importer.minimumFailedImportsToKeep | minimumFailedImportsToKeep |
sophora.importer.preProcessing.className | preprocessing.className |
sophora.importer.preProcessing.scriptfolder | preprocessing.scriptFolder |
sophora.importer.transformation.repairXml | Not available anymore. |
sophora.importer.transformationMode | transform |
sophora.importer.validate.documents | validateDocuments |
sophora.importer.watchfolder.checkInterval | folders.watchCheckInterval |
sophora.importer.watchfolder.includeSubfolder | folders.watchRecursive |
sophora.importer.watchfolder.regex.filesToImport | folders.watchFilesRegex |
sophora.importer.xslTransformerFactory | xslTransformerFactory |
Example application.yml
# Connection to the Sophora server.
# Note: There is one connection which is shared among all importer instances of the importer process.
sophora:
client:
server-connection:
# The hostname to connect with the Sophora server (e.g. http://sophora.example.com:1196)
url: http://localhost:1196
# Username to access the Sophora server.
username: alice
# Password to access the Sophora server.
password: secret
# If a connection to the Sophora server is not possible, try again a few times. Default is 3.
retries: 100
# The time in seconds to wait between connection attempts.
retry-interval: 10
# cache:
# The size of the document cache.
# If you apply a transformation or a preprocessor that frequently accesses different existing documents from the
# Sophora server, you may want to increase this cache. Consider the increased memory footprint and assign more
# memory to the importer if necessary.
# document-cache-elements-in-memory: 1000
# The size of the published document cache.
# Similar to documentCacheSize, except that this value only considers the published versions of documents. If you
# retrieve the published version of documents in a transformation or a preprocessor, this is the value you may want
# to adjust.
# published-document-cache-elements-in-memory: 100
# Optional proxy configuration for HTTP connections to the Sophora server.
# proxy:
# host: proxy.example.com
# port: 8080
# username: alice
# password: secret
importer:
# The Importer's name to be used for JMX and logging.
name: Demo-Importer
# Disable import for test purposes (e.g. if you want to check the XSL transformation): The
# importer won't try to send XML files to the content manager, if the value of this property
# is set to true. Values: "true" or "false"; default value is "false".
disabled: false
# Proxy for accessing external content during the import.
# A proxy configuration is needed if the importer operates behind a
# proxy and the Import XML is passed to the webservice as a remote URL
# or if the Import XML refers to binary files via http or https.
# httpProxyHost:
# httpProxyPort:
# httpProxyLogin:
# httpProxyPassword:
# JMX
rmiServicePort: 5000
rmiRegistryPort: 5001
# Username and password for the JMX interface [Optional]
jmxLogin: importerjmx
jmxPassword: password
folders:
# Interval (in milliseconds) to check the import directory (watch folder); e.g. 10000.
watchCheckInterval: 1000
# If set to true all subfolders (and their subfolders etc.) of the watch folder are included when watching for
# incoming Sophora-XML files. Make sure that no system folders (success or failure) are configured as subfolders of
# the watch folder if this paramter is set to true. Default value is false. The importer instance executes the
# individual document imports in lexicographical order based on the relative paths of the documents; i.e. an
# incoming file subfolder-A/import.xml is handled before a file subfolder-B/import.xml.
watchRecursive: true
webService:
# Enables or disables the SOAP webservice interface.
enabled: true
# Enables basic authentication for the SOAP webservice interface.
authenticationRequired: true
# Key of the instance to use for SOAP requests which don't specify an instance.
defaultInstance: common
# List of users for authentication.
logins:
admin: xxx
# Set to true for polling feeds configured in the DeskClient. Default is false.
feedPollingEnabled: true
# Determines whether to keep the temporary files after the Importer finishes. If the value is true, these files are
# moved to the success or failure directory together with the XML files.
keepTempFiles: true
# Determines whether a timestamp is attached to the names of the files that are imported and to the names of the
# temporary files.
filenamesAddTimestamp: false
# A cron expression that specifies when the "success" and "failure" folders will be cleaned up. The expression uses
# the format of the Quartz CronTrigger.
cleanupFoldersCron: "0 0 9 ? * * *"
# When cleaning up the "success" folder of the instance, files in the folder must be at least this many days old to be
# deleted. Set to 0 to disable deletion for this instance / folder.
cleanupFoldersSuccessfulMaxAge: 90
# When cleaning up the "failure" folder of the instance, files in the folder must be at least this many days old to be
# deleted. Set to 0 to disable deletion for this instance / folder.
cleanupFoldersFailureMaxAge: 90
# Configuration of the importer instances
instances:
- name: Common Imports
# The key is used to reference this instance in SOAP and feed imports.
key: common
# Defines the XSL transformation mode. The following values are valid:
# 'transformIfNotSophoraXml' (default), 'forceTransform', 'skipTransform'
transform: skipTransform
folders:
# The import directory that the importer instance monitors.
watch: /cms/data/import/incoming
# This regular expression determines which files in the watch folder are processed by the importer.
watchFilesRegex: "(?i).+[.](xml)"
# Directory to save temporary files in.
temp: /cms/data/import/temp
# Target directory to move the XML files to, if the import process finished successfully.
# This property allows to use patterns within the given path in the form of ${pattern}. Supported patterns:
# ${date;<DateFormat>} - a date of the import in the defined form. Supported date formats see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
# ${<xslParameter>} - XSL parameter keys defined in the XML-Feeds
success: /cms/data/import/success
# Target directory to move the XML files to, if the import process failed.
# This property allows to use patterns within the given path. Supported patterns see above.
failure: /cms/data/import/failure
# The directory where the XSL files are located.
# This property may be omitted if the 'transform' property has the value 'skipTransform'
xsl: /cms/sophora-importer/xsl/
# The classname of the XSL transformer factory, which is used for XSL transformations.
# The default value is 'org.apache.xalan.xsltc.trax.TransformerFactoryImpl'.
# xslTransformerFactory: org.apache.xalan.xsltc.trax.TransformerFactoryImpl
# Enables or disables webservice for according instance.
webServiceEnabled: true
# The site to import the documents to. This parameter is considered, if
# - the XML neither contains an empty <site> nor empty <structureNode> tag
# - the import operation is not an update of an existing document
# defaultSite: demosite
# The structure node to import the documents to. This parameter is considered, if
# - the <structureNode> element in the XML is empty
# - the import operation is not an update of an existing document
defaultStructureNode: /import
# Optional: Preprocess files before the import using a Java or Groovy class.
# The result of the preprocessor must be either valid Sophora XML or XML to be transformed using XSL.
# preprocessing:
# The class which implements the IPreProcession interface.
# className:
# Folder containing groovy preprocessing scripts. Can be left undefined if the preprocessor class is on the
# classpath.
# scriptFolder: /cms/sophora-importer/groovy
server:
# HTTP port of the web server for the SOAP web service and management endpoints (e.g. health).
port: 8081