Importer 4

Importer: Installation & Configuration

The Importer is an independent Java program that creates or updates documents in a Sophora repository based on Sophora XML.

Configuration Files

The Importer uses the following configuration files:

  • application.yml: This is the main configuration file.
  • sophora-importer-<version>.conf (optional): Used to set the JVM properties (JAVA_OPTS) such as heap size.
  • loader.properties (optional): Used for adding the contents of a folder to the classpath of the Importer.
  • logback-spring.xml: Logging configuration.

The Importer needs to be restarted for changes in any of these configuration files to take effect.

Installation

The Importer is a Spring Boot app and expects to find the configuration files next to the jar file, so we recommend to put the Importer application jar and the configuration files into the same directory. The name of the sophora-importer.conf file must exactly match the name of the jar file, i.e., if the name of the jar file includes the version number, the name of the conf file must include the version number as well.

Recommended directory structure:

/cms
    /sophora-importer
        /additionalLibs
        /groovy
        /logs
        application.yml
        sophora-importer-4.0.0.conf
        sophora-importer-4.0.0.jar
        loader.properties
        logback-spring.xml

Our Maven repository contains two files suitable for deploying the Importer:

  • com.subshell.sophora.importer-<VERSION>-executable.jar
  • com.subshell.sophora.importer-4.0.0-SNAPSHOT-bin.tar.gz

The executable jar is basically the Importer application without any configuration files. It is suitable when deploying the Importer using Ansible, Puppet or similar configuration management tools. The bin.tar.gz contains the executable jar as well as sample configuration files. Use this for manual deployments to get started quickly.

The "xsl" folder from the bin.tar.gz contains exemplary XSL files. These are examples for transforming XML files into Sophora XML using XSL (see section XSL Transformation Before Importing).

Starting and Stopping

The Importer jar file is a Spring Boot executable jar. On Linux and MacOS, the Importer can be started by running the executable jar as follows:

./sophora-importer-<version>.jar run

For running the Importer as a background daemon, use the options start and stop.

More options are documented in the Spring Boot documentation.

If required configuration files lack some required configuration options, the Importer cannot be started. The log file then contains information on why the startup has been aborted.

Adding jar files to the classpath

Jar files with additional classes for preprocessing etc. can be added to the classpath with the following entry in the file loader.properties:

# Loads resources (.class files etc.) from nested jar files in directories.
# Should contain comma-separated list of directories, archives, or directories within archives
# (e.g. lib,${HOME}/app/lib, earlier entries take precedence).
loader.path=additionalLibs

JAVA_OPTS

Options for the Java VM, such as heap size, can be set using the environment variable JAVA_OPTS or using an entry in the sophora-importer.conf file. For example:

JAVA_OPTS="-Xmx1G"

Management-Endpoints / Actuators

The Importer exposes a few HTTP endpoints for management and metrics. These are available at the same HTTP port as the SOAP web service. Access to the management endpoints is also using the authentication settings for the web service. Notable endpoints are:

  • /actuator/health
  • /actuator/jolokia
  • /actuator/prometheus
  • /actuator/sophora-server

Configuration using the application.yml

Within the importer process one or more importer instances run. Some configuration options are specific for each instance. For example, each instance has its own watch folder. So you can, for example, run an importer process with one instance responsible for video imports, one instance responsible for image imports from an image database and another instance responsible for live ticker imports.

Options for the connection to the Sophora Server

PropertyDescriptionDefault
sophora.client.server-connection.urlDeprecated: As of importer 4.1.2, use sophora.client.server-connection.urls instead.
The single address (RMI or HTTP) to connect with (e.g. http://demo.de:1196).
Note: There is one connection which is shared among all importer instances of the importer process.
sophora.client.server-connection.urlsA list of addresses (RMI or HTTP) to connect with e.g.:
urls:
 - https://demo.de:1196
 - https://demo2.de:1196

Note: There is one connection which is shared among all importer instances of the importer process.
sophora.client.server-connection.usernameUsername to access Sophora's content manager.
sophora.client.server-connection.passwordPassword to access Sophora's content manager.
sophora.client.cache.document-cache-elements-in-memoryThe size of the document cache. If you apply a transformation or a preprocessor that frequently accesses different existing documents from the Sophora server, you may want to increase this cache. Consider the increased memory footprint and assign more memory to the importer if necessary.1000
sophora.client.cache.published-document-cache-elements-in-memoryThe size of the published document cache. Similar to sophora.client.cache.document-cache-elements-in-memory, except that this value only considers the published versions of documents. If you retrieve the published version of documents in a transformation or a preprocessor, this is the value you may want to adjust.100
sophora.client.misc.data-dirDefines a directory which may be used by the Sophora Client Api for persisting information like the available nodes in a cluster. The directory must be specified over an absolute path. Default is the working directory of the importer.
sophora.client.proxy.hostURL of the used proxy host, e.g. http://www.proxy.org.
sophora.client.proxy.passwordPassword to access the used proxy.
sophora.client.proxy.portPort of the used proxy host, e.g. 8181.
sophora.client.proxy.usernameUsername to access the used proxy.
sophora.client.server-connection.retriesIf a connection to the Sophora server is not possible try again a few times.3
sophora.client.server-connection.retry-intervalThe time in seconds to wait between connection attempts.10
sophora.client.server-connection.use-migration-modeShould only be used in rare circumstances. Use this only if you know what you are doing!

Enables the migration mode when accessing the repository. When migration mode is switched on it is possible to set these system properties that normally cannot be set:
  • sophora:modificationDate
  • sophora:modifiedBy
  • sophora:firstPublicationDate
  • sophora:id
  • sophora:creationDate (only when the document is created, since 4.2.1)
  • sophora:createdBy (only when the document is created, since 4.2.1)
  • sophora:externalId (only when the document is created)
  • jcr:uuid (only when the document is created)
Additionally the properties sophora:modificationDate, sophora:modifiedBy, sophora:publicationDate, sophora:firstPublicationDate and sophora:publishedBy can be controlled when a new version of the document is made (e.g. the document is published via a "publish" instruction).
false

Options for the embedded HTTP server

PropertyDescriptionDefault
server.portHTTP port of the web server for the SOAP web service and management endpoints (e.g. health).8081
server.addressInterface address to bind to.0.0.0.0

Global options

Some of these options can be overridden for each instance.

PropertyDescriptionDefault
importer.cleanupFoldersCronA cron expression that specifies when the "successful" and "failure" folders will be cleaned up. The expression uses the format of the Quartz CronTrigger. See also Automatically deleting old files. Leave empty to disable cleanup.
importer.cleanupFoldersFailureMaxAgeWhen cleaning up the failure folder of an instance, files in the folder must be at least this many days old to be deleted. Set to 0 to disable deletion for this folder.0
importer.cleanupFoldersSuccessfulMaxAgeWhen cleaning up the success folder of an instance, files in the folder must be at least this many days old to be deleted. Set to 0 to disable deletion for this folder.0
importer.disabledThis is for test purposes only (e.g. if you want to check the XSL transformation). If set to true, import transformations will run but no documents will be created or modified in the Sophora server.false
importer.feedPollingEnabledSet to true for polling configured feeds.false
importer.filenamesAddTimestampDetermines whether a timestamp is attached to the names of the files that are imported and to the names of the temporary files.true
importer.folders.failureTarget directory to move the XML files to, if the import process failed. This property allows to use patterns within the given path. For supported patterns see the success folder option.
importer.folders.feedPollingDataDirectory for saving data regarding the polling of feeds (e.g. last processed feed item per feed). If feed polling is active and no directory is given, a folder named feedpolling is created in the working directory of the importer.
importer.folders.fileAccessBaseThis optional property determines a directory which can be additionally accessed (recursively) during the import process.
That means on one hand that you can use references to binary files in the sophora xml document which point to files within this folder (or its subfolders etc.).
On the other hand it allows the webservice to read files in the specified directory (or its subfolders etc.). When the property is not configured, the webservice is not allowed to access any local files. It affects the possible URIs in the importXmlByReference* methods.
importer.folders.successTarget directory to move the XML files to, if the import process finished successfully.
This property allows to use patterns within the given path in the form of ${pattern}. Supported patterns:
${date;<DateFormat>} - the date of the import in the given format. For supported date formats see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
${<xslParameter>} - XSL parameter keys given in feed imports or imports by web service.
importer.folders.watchCheckIntervalInterval (in milliseconds) to check the import directory (watch folder).10000
importer.folders.watchFilesRegexThis regular expression determines which files in the watch folder are processed by the importer. The default value is a regular expression matching all file names which end with '.xml', but do not end with '.config.xml' or '.bin.xml'. If you change this regular expression be careful that files ending with '.config.xml' and '.bin.xml' are still ignored because these file endings are produced when sophora documents including xml binary data or node type configurations are exported.
Hint: The regular expression, which is used for a watch folder instance, is printed out in the log file when the importer is started.
(?i).+(?<!\\.config)(?<!\\.bin)\\.xml
importer.folders.watchRecursiveIf set to true, all subfolders (and their subfolders etc.) of the watch folderr are included when watching for incoming Sophora-XML files. If this paramter is set to true, make sure that other folders like the success and failure folders are not configured as subfolders of the watch folder.
The importer imports document in lexicographical order based on the relative paths of the documents; i.e. an incoming file subfolder-A/import.xml is handled before a file subfolder-B/import.xml.
false
importer.folders.xslThe directory where the XSL files are located for the importer (see section XSL Transformation Before Importing). This property may only be omitted if XSL transformations are disabled.
importer.httpProxyHostA proxy configuration is needed if the importer operates behind a proxy and the Import XML is passed to the webservice as a remote URL, the Import XML refers to binary files via http or https or feeds are imported. Example: http://www.proxy.org.
importer.httpProxyLoginOptional username to access the used proxy.
importer.httpProxyPasswordPassword to access the used proxy. If a username is set a password has to be set as well.
importer.httpProxyPortPort of the used proxy host, e.g. 8181.
importer.httpSoTimeoutTimeout in milliseconds when accessing feeds or making other http requests (for example downloading images).30000
importer.jmxLoginUsername for the JMX connection.
importer.jmxPasswordPassword for the JMX connection.
importer.keepTempFilesDetermines whether to keep the temporary files after the Importer finishes. If the value is true, these files are moved to the success or failure directory together with the XML files.
importer.maximumImportsToKeepNumber of import results to keep in memory for JMX.
importer.minimumFailedImportsToKeepThe minimum number of failed import results to keep in memory for JMX.
importer.nameThe Importer's name to be used for JMX, logging, and matching feed configurations.
importer.preprocessing.classNameDefines the class which implements the IPreProcession interface.
importer.preprocessing.scriptFolderFolder containing groovy preprocessing scripts. Can be left blank if the preprocessor class is on the classpath or preprocessing is not used. Using precompiled groovy class files from "additionalLibs" folder is recommended for best performance, as otherwise scripts are recompiled for each import.
importer.rmiRegistryPortRMI port for external MBean requests.6001
importer.rmiServicePortInternal RMI port for the JMX communication.6000
importer.springAdditionalBasePackagesAdditional java packages which the Importer should scan for Spring component classes. By using this property and putting client specific jars in the classpath of the importer you can use Spring functionality in your project specific code. If you want to specify more than one package you can do this by separating different packages with commas.
e.g. "com.subshell.sophora.sport.imports,com.subshell.sophora.other.imports"
importer.transformDefines the XSL transformation mode for this importer instance.

The following values are valid:
transformIfNotSophoraXml: An XSL transformation will be performed, if the input XML file does not contain valid Sophora-XML.
forceTransform: Always apply an XSL transformation before importing (independent from the validity of the source XML file).
skipTransform: Never execute an XSL transformation.

The default ist transformIfNotSophoraXml.
importer.validateDocumentsDefines whether documents are validated or not. By default documents are validated. The validation should at most be disabled in very special situations (and should be activated afterwards!) - this might, for example, be a migration scenario from Sophora to Sophora where you have to migrate documents which lack a recently added mandatory property. When the validation is disabled (value false) invalid documents can be saved: You can save documents with missing mandatory properties and with property values that don't match the according validation expression - furthermore, results of validation scripts are ignored.true
importer.webService.authenticationRequiredEnables basic authentication for the SOAP webservice interface. Possible values are true and false.false
importer.webService.defaultInstanceSets the default instance for the SOAP webservice. This property is used by all webservice methods which do contain 'ToInstance' in its method name.First instance
importer.webService.enabledEnables or disables the SOAP webservice interface.false
importer.webService.loginsMap of usernames to passwords for authentication.
importer.xslTransformerFactoryThe classname of the XSL transformer factory, which is used for XSL transformations (see section Using a Custom XSL Transformer). The default value is org.apache.xalan.xsltc.trax.TransformerFactoryImpl.

Options for each instance

Each importer instance has a set of configuration options. For some options, it is possible to set defaults in the global options, which can then be overridden for a particular instance.

Instances are configured like this in the application.yml:

importer:
  instances:
    - name: Common Imports
      key: common
      folders:
        watch: /cms/data/import/incoming
      ...
    - name: Image importer
      key: images
      ...
PropertyDescriptionDefault
cleanupFoldersFailureMaxAgeWhen cleaning up the failure folder of the instance, files in the folder must be at least this many days old to be deleted. Set to 0 to disable deletion for this instance / folder.
cleanupFoldersSuccessfulMaxAgeWhen cleaning up the success folder of the instance, files in the folder must be at least this many days old to be deleted. Set to 0 to disable deletion for this instance / folder.
defaultSiteThe site to import the documents to. This parameter is only considered, if the XML neither contains an empty <site> nor empty <structureNode> tag and the import operation is not an update of an existing document.
defaultStructureNodeThe structure node to import the documents to. This parameter is only considered, if the <structureNode> element in the XML is empty and the import operation is not an update of an existing document.
disabledThis is for test purposes only (e.g. if you want to check the XSL transformation). If set to true, import transformations will run but no documents will be created or modified in the Sophora server.false
filenamesAddTimestampDetermines whether a timestamp is attached to the names of the files that are imported and to the names of the temporary files.true
folders.failureTarget directory to move the XML files to, if the import process failed. This property allows to use patterns within the given path. For supported patterns see the success folder option.
folders.fileAccessBaseThis optional property determines a directory which can be additionally accessed (recursively) during the import process.
That means on one hand that you can use references to binary files in the sophora xml document which point to files within this folder (or its subfolders etc.).
On the other hand it allows the webservice to read files in the specified directory (or its subfolders etc.). When the property is not configured, the webservice is not allowed to access any local files. It affects the possible URIs in the importXmlByReference* methods.
folders.successTarget directory to move the XML files to, if the import process finished successfully.
This property allows to use patterns within the given path in the form of ${pattern}. Supported patterns:
${date;<DateFormat>} - the date of the import in the given format. For supported date formats see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
${<xslParameter>} - XSL parameter keys given in feed imports or imports by web service.
folders.tempDirectory to save temporary files in, which the importer instance produces.
folders.watchThe import directory that the importer instance monitors. Only files matching the watchFilesRegex will be processed.
folders.watchCheckIntervalInterval (in milliseconds) to check the import directory (watch folder).10000
folders.watchFilesRegexThis regular expression determines which files in the watch folder are processed by the importer. The default value is a regular expression matching all file names which end with '.xml', but do not end with '.config.xml' or '.bin.xml'. If you change this regular expression be careful that files ending with '.config.xml' and '.bin.xml' are still ignored because these file endings are produced when sophora documents including xml binary data or node type configurations are exported.
Hint: The regular expression, which is used for a watch folder instance, is printed out in the log file when the importer is started.
(?i).+(?<!\\.config)(?<!\\.bin)\\.xml
folders.watchRecursiveIf set to true, all subfolders (and their subfolders etc.) of the watch folderr are included when watching for incoming Sophora-XML files. If this paramter is set to true, make sure that other folders like the success and failure folders are not configured as subfolders of the watch folder.
The importer imports document in lexicographical order based on the relative paths of the documents; i.e. an incoming file subfolder-A/import.xml is handled before a file subfolder-B/import.xml.
folders.xslThe directory where the XSL files are located for the importer (see section XSL Transformation Before Importing). This property may only be omitted if XSL transformations are disabled.
keyThe key or id of this instance. The key is used by SOAP import requests to select an importer instance. It is also used for mapping a feed import configuration to an instance.
maximumImportsToKeepNumber of import results to keep in memory for JMX.
minimumFailedImportsToKeepThe minimum number of failed import results to keep in memory for JMX.
nameThe name of the particular importer instance to be used for JMX and logging purposes.
preprocessing.classNameDefines the class which implements the IPreProcession interface.
preprocessing.scriptFolderFolder containing groovy preprocessing scripts. Can be left blank if the preprocessor class is on the classpath or preprocessing is not used. Using precompiled groovy class files from "additionalLibs" folder is recommended for best performance, as otherwise scripts are recompiled for each import.
transformDefines the XSL transformation mode for this importer instance.

The following values are valid:
transformIfNotSophoraXml: An XSL transformation will be performed, if the input XML file does not contain valid Sophora-XML.
forceTransform: Always apply an XSL transformation before importing (independent from the validity of the source XML file).
skipTransform: Never execute an XSL transformation.

The default ist transformIfNotSophoraXml.
validateDocumentsDefines whether documents are validated or not. By default documents are validated. The validation should at most be disabled in very special situations (and should be activated afterwards!) - this might, for example, be a migration scenario from Sophora to Sophora where you have to migrate documents which lack a recently added mandatory property. When the validation is disabled (value false) invalid documents can be saved: You can save documents with missing mandatory properties and with property values that don't match the according validation expression - furthermore, results of validation scripts are ignored.true
webServiceEnabledEnables or disables the SOAP webservice interface for this instance.
xslTransformerFactoryThe classname of the XSL transformer factory, which is used for XSL transformations (see section Using a Custom XSL Transformer). The default value is org.apache.xalan.xsltc.trax.TransformerFactoryImpl.

Example application.yml:

sophora:
  client:
    server-connection:
      urls:
        - http://localhost:1196
      username: alice
      password: secret
      retries: 100
      retry-interval: 10

importer:
  name: Demo-Importer

  # JMX
  rmiServicePort: 5000
  rmiRegistryPort: 5001
  jmxLogin: importerjmx
  jmxPassword: password

  # Defaults for all instances.
  folders:
    watchCheckInterval: 1000
    watchRecursive: true

  webService:
    enabled: true
    authenticationRequired: true
    defaultInstance: common
    logins:
      admin: xxx

  filenamesAddTimestamp: false

  cleanupFoldersCron: "0 0 9 ? * * *"
  cleanupFoldersSuccessfulMaxAge: 90
  cleanupFoldersFailureMaxAge: 90

  # Configuration of the importer instances
  instances:
    - name: Common Imports
      key: common
      transform: skipTransform
      folders:
        watch: /cms/data/import/incoming
        temp: /cms/data/import/temp
        success: /cms/data/import/success
        failure: /cms/data/import/failure
        xsl: /cms/sophora-importer/xsl/
      defaultStructureNode: /import

server:
  port: 8081
  address: 0.0.0.0

logback-spring.xml (optional)

This optional configuration file defines the Importer's logging behaviour. The Importer does not need to be restarted for changes in the file to take effect, if the root attribute scanPeriod is set in the logging configuration file.

Introductory information about logback and its configuration can be found here: http://logback.qos.ch/

If you like to enable separate logging for each importer instance, you can use this exemplary configuration file and remove the comments at the two marked locations. If you do so, the importer will create the following log files:

  • sophora-importer.log: the default log file with all information. This can be disabled by removing the "FILE" appender.
  • sophora-importer_instance-main.log: the log file which contains all information, that can't be assigned to one specific instance.
  • sophora-importer_instance-<number>.log: One log file for each configured instance, only showing instance specific information.

Exemple logback-spring.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<!-- For more information on logback logging see: http://logback.qos.ch/manual/index.html -->
<configuration scan="true" scanPeriod="10 seconds">
	<jmxConfigurator/>

	<!-- Name to be shown in the subject of email notifications. -->
	<property name="IMPORTER_NAME" value="Test-Importer" />
	<!-- Logging-Event-Class ("ERROR", "INFO" etc.) for email subjects. -->
	<property name="LOGGING_EVENT_CLASS" value="%-5p" />
	<!-- Logging pattern: 'importerInstanceName' and 'sourceFileName' are references to MDC properties in the importer code. -->
	<property name="APPENDER_PATTERN"
			  value="%d{dd.MM.yyyy HH:mm:ss} %5level [%12.12thread] [%X{importerInstanceName}: %X{feedName} %X{sourceFileName}] %.40(%logger{0}:%L) --- %msg%n%ex"/>

	<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
		<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
			<evaluator>
				<!-- No log messages marked as 'SPECIAL_EMAIL_NOTIFICATION' should be shown. -->
				<expression>marker != null &amp;&amp; marker.getName().equals("SPECIAL_EMAIL_NOTIFICATION")</expression>
			</evaluator>
			<OnMismatch>NEUTRAL</OnMismatch>
			<OnMatch>DENY</OnMatch>
		</filter>
		<encoder>
			<pattern>${APPENDER_PATTERN}</pattern>
		</encoder>
	</appender>

	<!-- Separate log files for each instance -->
	<appender name="INSTANCES" class="ch.qos.logback.classic.sift.SiftingAppender">
		<discriminator>
			<key>importerInstanceKey</key>
			<defaultValue>main</defaultValue>
		</discriminator>
		<sift>
			<appender name="INSTANCE-${importerInstanceKey}" class="ch.qos.logback.core.FileAppender">
				<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
					<evaluator>
						<expression>marker != null &amp;&amp; marker.getName().equals("SPECIAL_EMAIL_NOTIFICATION")</expression>
					</evaluator>
					<OnMismatch>NEUTRAL</OnMismatch>
					<OnMatch>DENY</OnMatch>
				</filter>

				<File>logs/sophora-importer-instance-${importerInstanceKey}.log</File>

				<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
					<fileNamePattern>logs/sophora-importer-instance-${importerInstanceKey}.%d{yyyy-MM-dd}.log</fileNamePattern>
					<maxHistory>7</maxHistory>
				</rollingPolicy>
				<encoder>
					<pattern>${APPENDER_PATTERN}</pattern>
				</encoder>
			</appender>
		</sift>
	</appender>

	<appender name="LOGFILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
		<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
			<evaluator>
				<!-- No log messages marked as 'SPECIAL_EMAIL_NOTIFICATION' should be shown. -->
				<expression>marker != null &amp;&amp; marker.getName().equals("SPECIAL_EMAIL_NOTIFICATION")</expression>
			</evaluator>
			<OnMismatch>NEUTRAL</OnMismatch>
			<OnMatch>DENY</OnMatch>
		</filter>

		<File>logs/sophora-importer.log</File>
		<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
			<fileNamePattern>logs/sophora-importer.%d{yyyy-MM-dd}.log</fileNamePattern>
			<maxHistory>7</maxHistory>
		</rollingPolicy>
		<encoder>
			<pattern>${APPENDER_PATTERN}</pattern>
		</encoder>
	</appender>

	<!-- For more information on logback email logging see: http://logback.qos.ch/manual/appenders.html#SMTPAppender -->
	<appender name="EMAIL" class="ch.qos.logback.classic.net.SMTPAppender">
		<evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
			<marker>EMAIL_NOTIFICATION</marker>
			<marker>SPECIAL_EMAIL_NOTIFICATION</marker>
		</evaluator>
		<SMTPHost>smtp.host.de</SMTPHost>
		<Username>xxx@yourmail.com</Username>
		<Password>your_password</Password>
		<To>importererror@yourcompany.com</To>
		<From>importer@yourcompany.com</From>
		<Subject>${LOGGING_EVENT_CLASS}: Importer '${IMPORTER_NAME}', Instanz '%X{importerInstanceName}'</Subject>
		<layout class="ch.qos.logback.classic.PatternLayout">
			<Pattern>${APPENDER_PATTERN}</Pattern>
		</layout>
		<CyclicBufferTracker class="ch.qos.logback.core.spi.CyclicBufferTracker">
			<!-- Send just one log entry per email. -->
			<BufferSize>1</BufferSize>
		</CyclicBufferTracker>
		<!-- Encoding of the email. -->
		<CharsetEncoding>ISO-8859-1</CharsetEncoding>
	</appender>

	<logger name="com.subshell.sophora" level="INFO"/>
	<logger name="org.springframework.boot" level="INFO"/>

	<root level="WARN">
		<appender-ref ref="LOGFILE" />
		<!-- Remove comment if you want log to the console.
		<appender-ref ref="STDOUT" />
		-->
		<!-- Remove comment if you want to have separate log files for each importer instance.
		<appender-ref ref="INSTANCES" />
		-->
		<!-- Remove comment if you want to have email notifications on particular importer errors.
		<appender-ref ref="EMAIL" />
		-->
	</root>
</configuration>

Binary property names for old versions of Sophora XML (optional)

Binary content within Sophora is modeled as a special property which needs to be treated separately in Sophora XML: First, you have to declare properties as binary explicitly. Next, there must be a mapping of the properties' names and the according mimetypes. This assignment is done in the application.yml file as shown in the following example:

# Provides a default mapping between binary property name and mimetype name.
# This is needed by the importer to successfully import documents with Sophora-XML <= 1.6 if custom binary properties are used.
importer:
  binaryProperties:
    'sophora:binarydata': 'sophora:mimetype'
    'sophora-extension:binarydata': 'sophora:mimetype'
    'core:binarydata': 'core:mimetype'

Each element in the map assigns a binary property to a mimetype property.

Properties in the Sophora XML that match one of the keys in this map are interpreted as binary. There must be another property on the same level of the Sophora XML that matches the value of the binary properties map. For example:

<childNodes>
 <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
 <properties>
 <property name="sophora-extension:binarydata">
 <value>olympiapeking_crowd.jpg</value>
 </property>
 <property name="sophora:mimetype">
 <value>image/jpeg</value>
 </property>
 </properties>
 <childNodes/>
 <resourceList/>
 </childNode>
 </childNodes>

If the binary properties options are not configured, a default configuration will be applied. If a binary property is not covered by the standard configuration, it will be imported nonetheless. Its mimetype is then identified with the help of the file ending. In that case, the Importer will write a warning to the log saying that a binary property may not have been imported correctly.

Last modified on 10/16/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon