Importer | Version 3

Importer: Import Via Webservice

The import process: How to import documents with the importer webservice.

Archived documentation for Sophora 3. End-of-support date for this version: 7/25/21

Documentation for Sophora 4

The main difference between an import via webservice and via watchfolder is the way of passing the files to the importer. After the files (document description in XML and possibly binary files) are handed over to the Importer the process is mostly identical. The webservice offers the same functionality as the watchfolder mechanism except for site mappings. In addition to the watchfolder mechanism the webservice returns an XML description of the import status instead of just writing message to the log.

First of all the webservice has to be activated and configured in the base configuration file sophora-importer.properties (see Properties in the base configuration file 'sophora-importer.properties') by setting the properties sophora.importer.webservice.active, sophora.importer.webservice.baseAddress and optionally sophora.importer.webservice.defaultInstance. It is not recommended to switch the webservice on if you dont need it. If the Importer operates behind a proxy server the properties sophora.importer.webservice.proxyHost and sophora.importer.webservice.proxyPort should be set in the base properties file as well. This is necessecary if the Import XML is passed to the webservice by an URL which refers to a remote file via http.

Within the instance configuration files (see Properties in the instance configuration file(s) 'sophora-importer_instance-NNN.properties') the properties sophora.importer.instance.webservice.enabled and sophora.importer.fileaccess.basedir should be configured. If sophora.importer.instance.webservice.enabled is set to false for an importer instance, there will be an error if someone tries to import documents via webservice using the corresponding instance. Setting this property makes it possible to enable or disable the webservice per instance. Please note that this property only has an effect if the webservice is enabled in general by setting sophora.importer.webservice.active to 'true'.

To secure the webservice you can enable basic authentication with the optional property sophora.importer.webservice.authentication.active. The users for the authentication can be configured in the file webservice_users.json. (see Installation & Configuration)

Once the webservice is configured and running you should be able to call the WSDL description. This description defines eight methods that can be invoked remotely to import documents:

  1. String importXml(String xml, Set<XslParam> xslParams)
  2. String importXmlWithBinaries(String xml, List<BinaryFileBean> binaryFilesList, Set<XslParam> xslParams)
  3. String importXmlByReference(URI uri, String encoding, Set<XslParam> xslParams)
  4. String importXmlByReferenceWithBinaries(URI uri, String encoding, List<BinaryFileBean> binaryFilesList, Set<XslParam> xslParams)
  5. String importXmlToInstance(String xml, int instanceIndex, Set<XslParam> xslParams)
  6. String importXmlWithBinariesToInstance(String xml, List<BinaryFileBean> binaryFilesList, int instanceIndex, Set<XslParam> xslParams)
  7. String importXmlByReferenceToInstance(URI uri, String encoding, int instanceIndex, Set<XslParam> xslParams)
  8. String importXmlByReferenceWithBinariesToInstance(URI uri, String encoding, List<BinaryFileBean> binaryFilesList,int instanceIndex, Set<XslParam> xslParams)

All methods import documents described in Sophora-XML or possibly any other XML if an adequate XSL transformation is defined for the associated importer installation. As result a XML description of the import is returned (see next section). Binary files which are referred in the Sophora-XML can be imported as a list of filenames and base64 encoded binary data. If no binary files have to be added, use the method String importXml(String xml, String encoding) or String importXmlByReference(URI uri, String encoding).

The parameter encoding is optional for the two methods which work with references of the import XML. It defines the character encoding of the passed or referenced XML data. If this parameter is left blank, UTF-8 will be used. The methods importXml(String xml) and importXmlWithBinaries(String xml, List<BinaryFileBean> binaryFilesList) try to determine the character encoding from the passed file by parsing the XML header. If this is not possible UTF-8 will be used in this case as well.

The parameter xslParams is optional in all webservice methods. With this parameter you can define one or more key value pairs, which are are passed as XSL parameters to the XSL transformation process. If no XSL transformation is performed the parameter xslParams is ignored.

Every webservice method exists in two versions. The first one uses the default instance (which can be set by the property sophora.importer.webservice.defaultInstance). The other type of methods (which name ends with 'toInstance') provides a parameter for choosing the concrete instance to use. The instance is defined by its index starting with '1' for the first instance.

The XML describing the documents to import can be passed directly inside the SOAP message using methods 1., 2., 5. or 6. Alternatively an URI referencing the XML file can be passed using the methods 3., 4., 7. or 8. (all methods with 'byReference' in their name). It is possible to refer to remote and local files using 'http:' or 'file:' as valid protocols. Furthermore it is possible to reference a local file via a relative path. This path is relative to the base directory for file access. Due to security reasons it is not allowed to reference files in a higher folder hierarchy than the base directory for file access.
By using the protocol 'file:' it is only possible to access files, which are located recursively under the base directory for file access. This directory is configured for each importer instance by the property sophora.importer.fileaccess.basedir.

Examples

  • Referencing a local file: file:C:/importxml/sophoradocument100_document.xml
  • Referencing a local file via a relative path: importdata/sophoradocument100_document.xml
  • Referencing a remote file:http://www.example.com/importxml/sophoradocument100_document.xml

The following listing shows a possible SOAP message for importing an image document with binary data and additionally two XSL parameters. (Note: The XSL parameters are only relevant, if a XSL transformation is performed during the import process. In the example sophora XML is given in the element documentXml so a XSL transformation is only made if the importer property sophora.importer.transformationMode has the value forceTransform.)

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.importer.sophora.subshell.com/">
  <soapenv:Header />
  <soapenv:Body>
    <ws:importXmlWithBinaries>
      <!-- Xml to be imported (wrapped in a cdata block). -->
      <documentXml><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
        <document xmlns="http://www.sophoracms.com/import/2.0" nodeType="sophora-core:story" externalID="story_00004711">
          [...]
        </document>]]>
      </documentXml>
      <binaryFile>
        <!-- The filename of a binary file (as it appears in the binary property in the sophora xml). -->
        <filename>trendcityparis100_binary_1.jpg</filename>
        <!-- Base64 encoded binary data of this file. -->
        <binaryData>/9j/4...8n//2Q==</binaryData>
      </binaryFile>
      <!-- Zero or more XSL parameters possible. -->
      <xslParam>
        <!-- The key of the first parameter. -->
        <key>structureNode</key>
        <!-- The value of the first parameter. -->
        <value>/demosite/sports/handball</value>
      </xslParam>
      <xslParam>
        <!-- The key of the second parameter. -->
        <key>idStem</key>
        <!-- The value of the second parameter. -->
        <value>handball</value>
      </xslParam>
    </ws:importXmlWithBinaries>
  </soapenv:Body>
</soapenv:Envelope>

Webservice Response

The webservice will return a UTF-8 encoded xml description of the import's status. A webservice response might look like the following example:

<?xml version="1.0" encoding="UTF-8"?>
<importInformation xmlns="http://www.sophoracms.com/importinformation"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.sophoracms.com/importinformation http://www.sophoracms.com/importinformation/sophora-importinformation-1.2.0.xsd"
successful="true" importDate="2010-07-29T15:11:54.568+02:00" duration="0.742">
  <originalFileName>ws_1280409113825.xml</originalFileName>
  <importFile>/cms/import/broadcasts/successful/ws_1280409113825_2010-07-29_15-11-54-566.xml</importFile>
  <cleanedImportFile />
  <transformedImportFile /> <modifiedSophoraDocumentFile />  <errorText />
  <processedBinaryFiles>
    <file>/cms/import/broadcasts/successful/filmseinemutterundich102_binary_1.jpg</file>
  </processedBinaryFiles>
  <documents>
    <documentInformation newlyCreated="false" successfullySaved="true">
      <sophoraId>filmseinemutterundich102</sophoraId>
      <externalId>broadcast-5567523</externalId>
      <uuid>e8031441-240e-420b-bb5c-b5f817be6a51</uuid>
      <resourceListDocuments>
        <documentInformation newlyCreated="false" successfullySaved="true">
          <sophoraId>seinemutterundich102</sophoraId>
          <externalId>image-4324543</externalId>
          <uuid>f01ac61d-ad5a-4cbe-b3fc-41d4005434be</uuid>
          <resourceListDocuments />
        </documentInformation>
      </resourceListDocuments>
    </documentInformation>
  </documents>
</importInformation>

The direct children and the attributes of the element <importInformation> describe the overall import process: Was it successful (attribute "successful")? When did it happen (attribute "importDate")? How long did it take (attribute "duration")? Which files were transformed and created (elements "originalFileName", "importFile", "cleanedImportFile" and "transformedImportFile")? Which warnings and errors did happen ("errorText")? Which binary files where processed (element "processedBinaryFiles")?

In the above example a sophora xml with two <document> elements was imported: Inside a broadcast document was wrapped an image document (in the element <resourceList> of the broadcast document). Therefore in the above response xml there are two elements <documentInformation> with the outer element describing the broadcast import and the inner element describing the image import. For every handled document you achieve the following information: Was the document new created or was it an update of an existing document (attribute "newlyCreated")? Was the document successfully saved (attribute "successfullySaved")? How are the different ids of the processed document (elements "sophoraId", "externalId" and "uuid")?

Information about the whole import process (element <importInformation>):

ElementDescription
successful (attribute)Indicates if the import process was altogether successful. This means that any content that should be imported has been imported.
Note: In any case there may have been warnings occurred during the import process which are collected in the element <errorText>.
importDate (attribute)Shows the time of the import in the time format "ISO 8601" - "2010-07-29T15:11:54.568+02:00" for instance.
duration (attribute)Shows the duration (in seconds) of the import process - "0.242" for instance.
importDeferred
(attribute)
If the import XML contained the forceLock-Element with a timeout, the importer might defer the import if the respective document is locked. In this case, the SOAP-call returns before the import finishes and contains this attribute set to true. See the XML reference (section 'Asking the user to release a document lock') for more Information.
originalFileName (element)The original name of the xml file which was imported.
Note: This element is empty if a low level error has occurred.
importFile (element)The complete path of the moved and renamed xml file. This file is located either in the successful or in the failure folder of the importer instance.
Note: This element is empty if a low level error has occurred.
cleanedImportFile (element)The complete path of the cleaned xml file (if 'sophora.importer.transformation.repairXml' is configured). This file is located either in the successful or in the failure folder of the importer instance.
transformedImportFile (element)The complete path of the transformed xml file (if a xsl transformation was made). This file is located either in the successful or in the failure folder of the importer instance.
modifiedSophoraDocumentFile (element)The complete path of the modified sophora xml file (if xPath identifier expressions were used in the sophora xml). This file is located either in the successful or in the failure folder of the importer instance.
errorText (element)A collection of the errors and warning which occurred during the import process of the xml document.
processedBinaryFiles (element)The binary files which were handled during the import process of the xml document. Every file is wrapped in a <file> element.
documents (element)Information about the sophora documents which were handled during the import process of the xml document (see next table).

Information about a particular document import (element <documentInformation>):

ElementDescription
newlyCreated (attribute)"newlyCreated" shows if the sophora document was newly created (value "true") or if an existing document was updated (value "false").
Note: If the import of the document was not successful - i.e. the attribute "successfullySaved" is "false" - this attribute has no informative value.
successfullySaved (attribute)The attribute "successfullySaved" indicates whether the document was successfully saved in the repository.
sophoraId (element)The sophora id of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty or only set with the provided id stem of the document to be imported.
externalId (element)The external id of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty.
uuid (element)The uuid of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty.
resourceListDocuments (element)Information about the <document> elements which are placed in the element <resourceList> of the current <document> element (in the import xml!).

Triggering the import

As mentioned before there are hardly any differences between importing via webservice and via watchfolder. After the webservice retrieved the XML description and optionally the binary data, it creates one XML file and the binary files in the importer's temporary folder (see property sophora.importer.directory.temp). These files are then passed on to the Importer and processed as usual. Whether these files where put into a watchfolder before or transferred via webservice doesn't make any difference from this point.

Examples for Referencing Binaries via the Webservice

There are multiple possibilities to reference binary data when importing documents via the webservice of the Sophora Importer. This page is an addition to the section about the webservice of the Importer located in the Sophora Importer documentation. This documentation requires knowledge about the basics of webservices and the Sophora Importer. Please refer to the Sophora Importer documentation especially to the part about the webservice before continue reading.

Referencing binary files via the webservice

In general there are two different ways to reference binary files when importing via the webservice interface.

  1. Binary files may be referenced within the Sophora XML.
  2. Binary files may be referenced within the SOAP body as a binary file list, specified by a parameter.

These two possibilities are described in the following sections.

Referencing binary files within the Sophora XML

By using one of the following methods, binary files are referenced within the Sophora XML:

  • importXml
  • importXmlToInstance
  • importXmlByReference
  • importXmlByeferenceToInstance

Generally binary files may be referenced in all ways described in the documentation about binary properties in the Sophora Importer documentation. One slight difference is, that when referencing local files, the relative paths must be relative to the temporary directory of the used importer instance.

In addition to referencing local files, it is also possible to reference binary data via the http:// or the file:// protocol. When using the file:// protocol the referenced file must be available on the remote host, on which the the webservice is running. The file must be located in any subdirectory of the configured base directory. This directory can be defined by the configuration key sophora.importer.fileaccess.basedir in the properties file of the importer and may be be overridden in an instance configuration. The base directory must be configured to prevent arbitrarily access to the file system by the webservice.

Examples

In the following you can find different examples of referencing binary files. The first examples also include excerpts  of the basic xml skeleton.  The other examples are reduced to the binary property.

Referencing a binary file via its relative path

Please note that the path is relative to the temporary directory of the corresponding importer instance.

The complete example is available here.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:ws="http://ws.importer.sophora.subshell.com/">
    <soapenv:Header/>
    <soapenv:Body>
        <ws:importXml>
            <documentXml>
                <![CDATA[
<documents xmlns="http://www.sophoracms.com/import/2.4">
 <document nodeType="sophora-extension-nt:image" externalID="d506ac91-3cdd-41df-b70d-82be7edc6b0d">
 <properties>
 ...
 </properties>
 <childNodes>
 <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
 <properties>
 <property name="sophora-extension:binarydata" mimetype="image/jpeg">
 <value>../images/image.jpeg</value>
 </property>
 ...
 </properties>
 <childNodes />
 <resourceList />
 </childNode>
 ...
 </childNodes>
 <resourceList />
 <fields>
 ...
 </fields>
 <instructions>
 ...
 </instructions>
 </document>
</documents>
...
 ]]>
            </documentXml>
        </ws:importXml>
    </soapenv:Body>
</soapenv:Envelope>

Referencing a binary file via its absolute path

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>file:///cms/project/data/images/image.jpeg</value>
</property>

Including the binary values base64 encoded

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>data:;base64,/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgk...</value>
</property>

Referencing a binary file via the http protocol

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>http://www.example.com/image.jpg</value>
</property>

It is also possible to use the secure protocol https rather than http.

Referencing binary files as a binary file list

By using one of the following methods binary files are referenced as a list of binary files:

  • importXmlWithBinaries
  • importXmlWithBinariesToInstance
  • importXmlByReferenceWithBinaries
  • importXmlByReferenceWithBinariesToInstance

These methods define additional parameters to include base64 encoded binary values. These parameters are defined within the SOAP body next to the sophora xml. They are are referenced by their name (image.jpg in the following example).

The complete example is available here.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:ws="http://ws.importer.sophora.subshell.com/">
    <soapenv:Header/>
    <soapenv:Body>
        <ws:importXmlWithBinaries>
            <documentXml>
                <![CDATA[
<documents xmlns="http://www.sophoracms.com/import/2.4">
 <document nodeType="sophora-extension-nt:image" externalID="d506ac91-3cdd-41df-b70d-82be7edc6b0d">
 <properties>
 ...
 </properties>
 <childNodes>
 <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
 <properties>
 <property name="sophora-extension:binarydata" mimetype="image/jpeg">
 <value>image.jpeg</value>
 </property>
 ...
 </properties>
 <childNodes />
 <resourceList />
 </childNode>
 ...
 </childNodes>
 <resourceList />
 <fields>
 ...
 </fields>
 <instructions>
 ...
 </instructions>
 </document>
</documents>
...
 ]]>
            </documentXml>
            <binaryFile>
                <binaryData>iVBORw0KGgoAAAANSUhEUgAAAyAAAA ...</binaryData>
                <filename>image.jpeg</filename>
            </binaryFile>
        </ws:importXmlWithBinaries>
    </soapenv:Body>
</soapenv:Envelope>

Please note that the SOAP header's element binaryData expects base 64 ecoded data. It is not possible to reference binary data via a URL or a path using this webservice parameter. However it is still possible to reference binary data additionally within the sophora xml like mentioned in section "Referencing binary files within the Sophora XML".

Details for using a Java-Client

In case you want to use a Java client (e.g. based on java-ws or Apache Axis 2) to connect to the webservice interface, the web service description (WSDL) is available here. Two typical interface methods look like these:

String importXml(String documentXml);
String importXmlWithBinaries(String documentXml, List<BinaryFileBean> binaryFile)

When using the first method the binary values are referenced in the Sophora XML as described above.

When using the second method (or any other interface method with the suffix "WithBinaries") the binary values are referenced as filenames in the sophora xml and passed as a list of BinaryFileBean objects. Make sure that you specify as many BinaryFileBeans in the list of binaryFiles as you have defined in the sophora xml.

Consider the following example with two filenames (image1.jpeg and image2.jpeg) in the sophora xml.

String sophoraXml = ...

BinaryFileBean image = new BinaryFileBean();
image.setFilename("image1.jpg");
image.setBinaryData(binaryData);

BinaryFileBean image2 = new BinaryFileBean();
image.setFilename("image2.jpg");
image.setBinaryData(binaryData);

List<BinaryFileBean> binaryFile = new ArrayList<BinaryFileBean>();
binaryFile.add(image);
binaryFile.add(image2);

importerServer.importXmlWithBinaries(sophoraXml, binaryFile);

Last modified on 12/5/19

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon