Importer | Version 3

Importer: Custom Preprocessing Before Importing

With custom Java or Groovy code the imported data can be preprocessed.

Archived documentation for Sophora 3. End-of-support date for this version: 7/25/21

Documentation for Sophora 4

The import process can be extended by custom Java or Groovy code. This extension is executed as first step in the importer process chain. In comparison to the XSL transformation (which may be executed afterwards) this preprocessing step does not require an XML file as input. When a new file is found in the watchfolder, the configured extension is called and must create the XML for the further processing.

Implementation

Any extension must implement the interface com.subshell.sophora.importer.preprocessing.IPreProcessing which contains the following methods:

/**
  * Pre-processing step of the importer. The implementation reads from the input and writes to the output.
  *
  * @throws PreProcessingException in case some unexpected exception occurs. 	 
  */
 void execute(InputStream input, OutputStream output, IErrorTracker errorTracker, Map<String, String> params) throws PreProcessingException; 

void setSophoraClient(ISophoraClient sophoraClient);

For convenience, an abstract base implementation (com.subshell.sophora.importer.preprocessing.AbstractPreProcessing) is provided which wraps all kinds of exceptions into PreProcessingException and offers the Sophora client. The method to implement is:

void executeInternally(InputStream input, OutputStream output, IErrorTracker errorTracker,  Map<String, String> params) throws Exception;

Configuration

For every importer instance a different preprocession can be configured. Thus, the configuration is done in the instance configuration. One parameter configures the path for groovy scripts (sophora.importer.preProcessing.scriptfolder), the second parameter (sophora.importer.preProcessing.className) defines the class, which implements the IPreProcession interface. For pure java implementation, the script path is optional. In this case the code must be put in the folder which is configured by the property sophora.importer.additionalClasspath.

Groovy

Groovy scripts are compiled and reloaded automatically without a restart of the importer. Additional jar files can be put into the sophora.importer.additionalClasspath folder.

A minimal implementation, which only copies the files, looks like this:

import org.apache.commons.io.FileUtils
import com.subshell.sophora.importer.core.utils.IErrorTracker
import com.subshell.sophora.importer.preprocessing.AbstractPreProcessing

class Dummy extends AbstractPreProcessing {

 	public void executeInternally(File inputfile, File outputfile, IErrorTracker errorTracker,  Map<String, String> params) {
 		FileUtils.copyFile(inputfile, outputfile) 
	}
 }

Java

If written in Java, the compiled class can be put into the configured script path. Alternatively a jar file with the class can be put into the sophora.importer.additionalClasspath folder.

Retry-Mechanism

It is possible that a file to be imported is not completely written into the watchfolder by the time the import starts. To handle this situation, there is a retry-mechanism that tries to detect incomplete XML and causes another attempt to read the file after a short delay. Learn more about this mechanism here.

When using a preprocessing script, the decision whether a file should be regarded as incomplete lies within the responsibility of the script. To trigger the retry-mechanism the script must call errorTracker.setParseError().

This mechanism is only enabled for imports via watchfolder - when using the web service this call will just let the import fail and not trigger retries. Please also note, that throwing an exception will always cause the import to fail, regardless of whether errorTracker.setParseError() has been called or not.

Last modified on 7/26/19

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon