Import XML

Properties

How to import properties, the primary characteristics of a document, using Sophora's import XML. Also, how to remove and protect properties.

Properties represent the primary characteristics of a document given in individual <property> elements. Which properties are available for which kind of documents is defined in the node type configuration within Sophora (see here for configuration details).

The subsequent example is the node type configuration of an image object:

<'sophora-content-nt'='http://www.subshell.com/sophora-content-nt/1.0'>
<'sophora-extension-nt'='http://www.subshell.com/sophora-extension-nt/1.0'>
<'sophora-content'='http://www.subshell.com/sophora-content/1.0'>
 
['sophora-content-nt:imageobject'] > 'sophora-extension-nt:image'
  orderable
  - 'sophora-content:tags' (string)
  - 'sophora-content:title' (string)
  - 'sophora-content:chargeable' (boolean)
  - 'sophora-content:credit' (string)
  - 'sophora-content:source' (string)
  - 'sophora-content:url' (string)
  - 'sophora-content:displayStyle' (string)

The import XML of an according document may look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<document nodeType="sophora-content-nt:imageobject"
          xmlns="http://www.sophoracms.com/import/2.8"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <properties>
    <property name="sophora-content:title">
      <value>Die Menschen in Peking strömen in den Olympiapark.</value>
    </property>
    <property name="sophora-extension:alttext">
      <value>Menschenmenge in Peking</value>
    </property>
    <property name="sophora-content:chargeable">
      <value>true</value>
    </property>
  </properties>
  <childNodes>
    <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
      [...]
    </childNode>
  </childNodes>
  <resourceList>
  </resourceList>
  <fields>
    [...]
  </fields>
  <instructions>
    [...]
  </instructions>
</document>

This is the configuration of the super node type sophora-extension-nt:image:

<'sophora-extension-nt'='http://www.subshell.com/sophora-extension-nt/1.0'>
<'sophora-extension'='http://www.subshell.com/sophora-extension/1.0'>
 
['sophora-extension-nt:image']
  orderable
  - 'sophora-extension:caption' (string)
  - 'sophora-extension:alttext' (string)
  - 'sophora-extension:iptc' (string)
  + 'sophora-extension:imagedata' ('sophora-extension-nt:imagedata') multiple

XHTML Tags in Property Values

Certain properties within Sophora may contain XHTML tags for special text formatting (e.g. copytext or teaser). Permitted elements are: <ul>, <li>, <strong>, <em> and <br/>. Therefore, <value> elements may incorporate such HTML tags. For example:

[...]
  <properties>
    <property name="sophora-content:shorttext">
      <value>Die Menschen in Peking <strong>strömen</strong> in den Olympiapark.<br />In großen Mengen!</value>
    </property>
    [...]
  </properties>
[...]

Date Values

By default, dates are specified in the ISO 8601 format:

[...]
  <properties>
    <property name="sophora-content:date">
      <value>2008-08-05T09:00:00+02:00</value>
    </property>
    [...]
  </properties>
[...]

If the Importer cannot read a date string accordingly (cannot apply the ISO 8601 date pattern), it tries to read the date in a "pseudo" ISO 8601 format, in which a space is used instead of the character 'T'. If this is not succesful neither, the importer tries to parse the date string using the format "dd.MM.yyyy HH:mm" whereas the "HH:mm" indication is optional. This allows to provide dates in the following way as well:

[...]
  <properties>
    <property name="sophora-content:date">
      <value>05.08.2008 09:00</value>
    </property>
    [...]
  </properties>
[...]

Binary Properties

Binary properties must be provided with the additional property mimetype, which advices the importer to import the referenced binary file with the given mime type.

The following example shows a typical image import. In the imagadata childnode you can see the binary property sophora-extension:binarydata with its attribute mimetype (set to "image/jpeg"):

<?xml version="1.0" encoding="UTF-8"?>
<document nodeType="sophora-content:image" xmlns="http://www.sophoracms.com/import/2.8">
  <properties>
     [...]
  </properties>
  <childNodes>
    <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
      <properties>
        <property name="sophora-extension:imagetype">
          <value>original</value>
        </property>
        <!-- The binary property with its attribute "mimetype" relates to a binary file in the filesystem. -->
        <property name="sophora-extension:binarydata" mimetype="image/jpeg">
          <value>image_4711_binary_1.jpeg</value>
        </property>
      </properties>
      <childNodes />
      <resourceList />
    </childNode>
  </childNodes>
  <resourceList />
  <fields>
     [...]
  </fields>
  <instructions>
    [...]
  </instructions>
</document>

The referenced image file image_4711_binary_1.jpeg has to lie in the same directory as the import XML file. Alternatively, you can specify a relative path like 'images/import/image_4711_binary_1.jpeg'. Due to security reasons a relative path cannot reference a file in a higher folder hierarchy. (This is only possible if you have explicitly configured this folder as accessible via the property 'sophora.importer.fileaccess.basedir' - see Properties in the instance configuration file(s) 'sophora-importer_instance-NNN.properties' ).

Additionally you may opt to reference binary data via URLs:

<!-- HTTP URL -->
<property name="sophora-extension:binarydata" mimetype="image/jpeg">
  <value>http://www.example.com/my-picture.jpg</value>
</property>
 
<!-- HTTPS URL -->
<property name="sophora-extension:binarydata" mimetype="image/jpeg">
  <value>https://www.example.com/my-picture.jpg</value>
</property>
 
<!-- File URL -->
<property name="sophora-extension:binarydata" mimetype="image/jpeg">
  <value>file:C:/temp/image_4711_binary_1.jpeg</value>
</property>
 
<!-- inline data -->
<property name="sophora-extension:binarydata" mimetype="image/gif">
  <value>data:;base64,R0lGODlhEAAQAMQAAP797/332f732f322f322vztufzkmfvjmfvkmf3tufvdg/zjmbuCFb2EFrl/
FLJ4E7V7FLl/FbuBFruCFqtwEapwEahuEa5zEqtwErJ3E6pxEqZrEKhtEf///wAAAAAAACH5BAEA
AB0ALAAAAAAQABAAAAVVYCeOZGmeKNk0adkAbCu+sDndzC0B/ERGvKCQ5xhBBoMAIRlACgQDiChT
KCQSVqy1+hBdDoeFYXFAGMaGCwlDoWAqGopijpFx5hZZZ6PY6Pd+f4ItIQA7</value>
</property>

If you use a file url - for example 'file:C:/temp/image_4711_binary_1.jpeg' (on a windows operating system) - you can only point to binary files which are located on the same directory as the import XML file (or recursively in subfolders of this directory) or which are located in an additional accessible folder (or recursively in subfolders of this directory) by configuring the property 'sophora.importer.fileaccess.basedir' (see Properties in the instance configuration file(s) 'sophora-importer_instance-NNN.properties').

Inline binary data uses the "data" URI scheme. Please note that contrary to the specification of the scheme, only the following form is supported by the importer:

data:;base64,...your data here...

Automatic Downscaling of Oversized Images on Import

In order to have oversized images scaled down automatically on import, the attribute autoScale="true" needs to be set in the binary property of the image. Under these condition any image that excesses the dimensions configured in the original image variant will be scaled down. The proportions of the image are maintained.

Example:

[...]
        <property name="sophora-extension:binarydata" mimetype="image/jpeg" autoScale="true">
          <value>image_4711_binary_1.jpeg</value>
        </property>
    [...]

Additional Information at Reference Property Values

If you export Sophora XML via the Sophora Deskclient or programatically via the Sophora Client you may notice the following attributes when exporting documents with references to other documents:

[...]
        <property name="sophora:reference">
          <value site="demosite" structureNode="/multimedia/images" sophoraId="image142" uuid="dd74be59-c921-4d88-aa2c-7094b6dd1384">image_4711</value>
        </property>
    [...]

The optional attributes "site", "structureNode", "sophoraId" and "uuid" contain meta information about the referenced document. These meta data are exported if a sophora version greater than or equal version 2.1 is selected at the export. The attributes are only set at reference properties.

The attributes site and structureNode are only used to reference structure node documents. In this case the importer uses at first the structure node which is specified via the site and the structureNode attribute. Only if this structure does not exist the specified external id is taken for the reference. The attributes sophoraId and uuid has no impact at the import at all. Please note that the text node of the element value contains the external id of the referenced document.

Multi Values

Multi values can be provided easily by adding multiple <value> elements to the <property> element:

[...]
  <properties>
    <property name="MULTIVALUE_PROPERTY_NAME">
      <value>Value 1</value>
      <value>Value 2</value>
      <value>Value 3</value>
    </property>
    [...]
  </properties>
[...]

Removing Properties

When updating an existing document via the Importer, you can remove properties from this document. To do so set the attribute "remove" of a property to true. If the file import creates a new document in the repository, such a property would simply be ignored.

The example below displays the removal of property sophora-content:name under the assumption that the underlying document already exists:

[...]
  <properties>
    <property name="sophora-content:name" remove="true" />
    [...]
  </properties>
[...]

In such a case, the value of the property will be ignored.

Protecting and unprotecting Properties

It is possible to protect and unprotect properties via the optional attribute protectionInstruction (refer also to the user and the administration manual). The possible values for protectionInstruction are:

ValueDescription
noneIf the corresponding property in the repository is protected, its value will not be changed to the value in the XML.
Skipping the attribute protectionInstruction has the same meaning as using the value none!
unprotectIf the corresponding property in the repository is protected, the protection will be removed and then the value from the XML will be set. After setting the value, the protection will not be re-established.
protectIf the corresponding property in the repository is not protected (or if the document is newly created), the value from the XML will be set and after setting the value, the property will be protected.
If the corresponding property in the repository is protected, the value from the XML will be ignored!
forceProtectIf the corresponding property in the repository is protected, the protection will be forced to be removed and then the value from the XML will be set. After setting the value, the property will be protected regardless of whether the property was protected before or not.
reprotectIf the corresponding property in the repository is protected, the protection will be removed and then the value from the XML will be set. After setting the value, the property will only be protected if the property was protected before. (Therefore the property will never be protected if the document is newly created.)

It is possible to use the attribute protectionInstruction without providing a value to the property. In this way you can modify the protection of  a property without changing its value.

The follwing XML snippet shows different protection use cases:

[...]
  <properties>
    <!-- Unprotect property (if it is protected) and setting value to "Hallo Welt!". -->
    <property name="sophora-content:headline1" protectionInstruction="unprotect">
      <value>Hallo Welt!</value>
    </property>
    <!-- Setting value to "Hallo Welt!" and afterwards protecting the property. (If the property is
         protected in the repository, its value will not be changed!) -->
    <property name="sophora-content:headline2" protectionInstruction="protect">
      <value>Hallo Welt!</value>
    </property>
    <!-- Unprotect property (if it is protected), setting value to "Hallo Welt!" and
         afterwards protecting the property.  -->
    <property name="sophora-content:headline3" protectionInstruction="forceprotect">
      <value>Hallo Welt!</value>
    </property>
    <!-- Unprotect property (if it is protected), setting value to "Hallo Welt!" and
         afterwards protecting the property only if it was protected before.  -->
    <property name="sophora-content:headline3" protectionInstruction="reprotect">
      <value>Hallo Welt!</value>
    </property>
    <!-- Protect property if it is not already protected. The value of the property will not be modified. -->
    <property name="sophora-content:headline4" protectionInstruction="protect" />
    <!-- Remove property value and then protect property. (The property will only be removed, if it
         is not already protected!) -->
    <property name="sophora-content:headline5" protectionInstruction="protect" remove="true" />
    [...]
  </properties>
[...]

Last modified on 10/2/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon