Configuration
To enable the Linkchecker you have to configure the following property in the sophora.properties
file from Sophora server
sophora.linkchecker.enabled=true
Furthermore, the following configuration parameters can be set in the sophora.properties
file:
Parameter | Description | Default Value |
---|---|---|
sophora.linkchecker.username | User name for Linkchecker job | admin |
sophora.linkchecker.urlChecker.testUrl | URL used for checking internet connectivity | http://www.web.de |
sophora.linkchecker.checkerJob.cronExpression.unavailable | Cron expression to schedule the job to check (currently) all broken links | 0 0 5 * * ? |
sophora.linkchecker.checkerJob.cronExpression.available | Cron expression to schedule the job to check (currently) all working links | 0 0 3 * * ? |
sophora.linkchecker.documentProposal.sectionNames.<Proposal section name> | The name of the proposal section where proposals for broken links should be added. Sections can be mapped to different structure paths. The names of the structure nodes in the path have to be separated by a period ("."). This mapping includes all sub structure nodes, e.g.sophora.linkchecker.documentProposal.sectionNames If a link becomes unavailable it will be added to the closest proposal section. The default proposal section can be set by sophora.linkchecker.documentProposal. .This parameter can also be set via DeskClient in the configuration document. Changes of this setting in the configuration document will take effect on the next run of the Linkchecker and don't require a restart of the Sophora server. | |
sophora.linkchecker.documentProposal.expireTime | Expiration time in days for created document proposals | 0 |
sophora.linkchecker.urlChecker.setOffline | Set to true to set broken links offline | true |
sophora.linkchecker.urlChecker.httpStatusCodeWhitelist | All links with a status code greater than or equals 400 are marked as unavailable. Unless they are declared in this list. Different codes are comma separated. | 500,400,503 |
sophora.linkchecker.urlChecker.connectionTimeout | Connection timeout in milliseconds. | 20000 |
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailable | Defines whether links that have exceeded the configured connection timeout, should be treated as unreachable. (true or false) | true |
Adding to a Proposal Section
If sophora.linkchecker.documentProposal.sectionNames
is set, broken links are added to the proposal section that is mapped to the closest structure path according to the link document's structure node.
The name must be unique and point to an existing proposal section (the Linkchecker will not create it). If a broken link is found to be working again, it is removed from the proposal section.
Setting Broken Links Offline
If the parameter sophora.linkchecker.urlChecker.setOffline
is set to true, broken links are set offline. This is done in addition to adding them to a proposal section. When a broken link is found to be working again, the proposal is removed, but the document is left offline.
Using an URL for Testing Internet Connectivity
When a broken link is identified, it won't be marked as "unavailable" immediately. Instead, the configured test URL is checked first to verify whether there is an internet connection at all.
If the test URL is available, the previously checked link is marked as "unavailable" and its document is set offline.
If no connection exists, the Linkchecker skips the current examination but proceeds with the list of links to be checked. As long as no connection is established links are skipped.
Time Scheduling with Cron Configurations
With the parameters sophora.linkchecker.checkerJob.cronExpression.unavailable
and sophora.linkchecker.checkerJob.cronExpression.available
you can define when and how often the Linkchecker inspects available and unavailable links.
These parameters get a cron expression as value. A cron expression contains six fields for seconds, minutes, hours, days, months and years. This enables you to configure the Linkchecker's activity in terms of time intervals.
An example: The cron expression "0 0/20 22-05 * * ?"
would start the Linkchecker every day between 10p.m. and 5a.m. every 20 minutes.
The cron expression given in the sophora.properties files is passed to the CronTrigger class from the org.quartz package without any further check. Make sure that matches the notation pattern as given in the class's documentation at http://quartznet.sourceforge.net/apidoc/topic285.html.
Example configuration
sophora.linkchecker.enabled=true
sophora.linkchecker.username=admin
# Erreichbarkeit des Internets prüfen
sophora.linkchecker.urlChecker.testUrl=http://www.web.de
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check nicht mehr erreichbare Links.
#Mehr Informationen zu dem Ausdruck findet man unter http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
#Beispiel: 0 0/2 10-18 * * ? läuft jeden Tag zwischen 10 und 18 Uhr jede zwei Minuten.
sophora.linkchecker.checkerJob.cronExpression.unavailable=0 0 0-8 * * ?
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check erreichbare Links.
sophora.linkchecker.checkerJob.cronExpression.available=0 0 0-8 * * ?
# Gültigkeitzeit der eingeführten Angebote in Tage
sophora.linkchecker.documentProposal.expireTime=0
# Set unavailable links offline.
sophora.linkchecker.urlChecker.setOffline=true
# Http Status-Code Allow-Liste. Alle Links mit einem Status Code größer als 400 werden als unerreichbar markiert solange
# die hier nicht deklariert werden. Die Werte werden mit Kommata getrennt.
sophora.linkchecker.urlChecker.httpStatusCodeWhitelist=503
#Verbindungszeitüberschreitung in Millisekunden.
sophora.linkchecker.urlChecker.connectionTimeout=10000
# Bestimmt ob Links, die zum Antworten die konfigurierte Zeit überschritten haben, als unerreichbar behandelt werden sollen.
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailable=true
# Der Standard-Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind
sophora.linkchecker.documentProposal.sectionNames.default=Broken Links
# Der Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind und unterhalb des Struktur-Pfads /demosite/trendcities verortet sind
sophora.linkchecker.documentProposal.sectionNames.demosite.trendcities=Broken Links (Trendcities)
Additional configuration parameters
The following parameters for the Linkchecker are set within the configuration part in the administrator section of the DeskClient.
Using a Custom Link Node Type
If the node type of the link document or the URL property description differ from the default values ("sophora-extension-nt:link" and "sophora-extension:url"), this needs to be set in the configuration using the following entries:
Key | Description | Example |
---|---|---|
textlinkNodeType | Definition of a link document's node type. | sophora-extension-nt:link |
textlinkUrlProperty | Property that contains the actual URL of the textlinkNodeType. | sophora-extension:url |
Custom Search for Broken Links
To enable a convenient search for broken link documents within the DeskClient, add the following entry:
Key | Value |
---|---|
search.modifiers | Broken Links;§[@jcr:primaryType='' and @sophora-extension:available='false'] |
If the property search.modifiers
already exists, append the value to its list.
Triggering the Linkchecker jobs via JMX
There is a CronManager JMX interface com.subshell.sophora.cron/CronManager
where you can find two operations to trigger the Linkchecker: triggerUnavailableLinkchecker
and triggerAvailableLinkchecker
.
These enable additional triggering and will not affect the regular schedule defined by the cron expressions.