Sitemaps 4

module Sitemaps: Documentation

Learn how to administrate the Sophora Sitemaps module to generate sitemaps protocol complient xml for search engine optimization.

Sitemaps protocol

The open sitemaps protocol (https://sitemaps.org/protocol.html) is a human and machine readable xml interface to describe the structure of a website. The sitemaps standard enables search engines to read and understand a website.

Doing so will greatly improve your SEO, as all modern search engines understand the sitemaps protocol. The Sophora Sitemaps module supports version 0.90 of the protocol. It is able to automatically generate the XML based on your website structure and by using customizable mapping classes in Java to provide meta data.

Google extensions

In addition to the open sitemaps standard, this addon supports the google extensions for news (version 0.9), images (version 1.1) and videos (version 1.1). These sitemap extensions and follow-up links are explained at the google support pages (https://support.google.com/webmasters/answer/183668?hl=en&ref_topic=4581190#extensions).

Sitemap-XML containing these extensions looks like this:

<urlset
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
    xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
    <url>
        <loc>
            http://my-site.subshell.com:8080/live/demosite/chronicle/2010/index.html
        </loc>
        <lastmod>2016-06-06T13:22:34.653+02:00</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.9</priority>
    </url>
    ...
</urlset>

Project Setup

The Sophora Sitemaps module is a separate maven dependency for your delivery. Installing the module provides you with the new Sitemaps servlet, which then has to be added to your web.xml and templates.xml.

Maven dependency: pom.xml

<dependency> 	
	<groupId>com.subshell.sophora</groupId>
 	<artifactId>sophora-sitemaps</artifactId>
 	<version>4.0.0</version>
</dependency>

Servlet mapping: web.xml

<servlet> 	 
	<servlet-name>SitemapServlet</servlet-name>
	<servlet-class>com.subshell.sophora.delivery.sitemap.SitemapServlet</servlet-class>
</servlet>
<servlet-mapping> 	 
	<servlet-name>SitemapServlet</servlet-name>
 	<url-pattern>/system/servlet/sitemap.servlet</url-pattern>
</servlet-mapping>

Sophora template mapping: templates.xml

<nodetype name="sophora-nt:structureNode">
	<templateset>
		...
		<template type="sitemap">/system/servlet/sitemap.servlet</template> 
	</templateset>
</nodetype>

Preparing a Solr Core

This addon generates the entries for your sitemap by reading all relevant documents from a specific Solr core. Specific converters are used to create those entries from solr documents. Technically any Solr core will do for this purpose, but we recommend using a dedicated solr core just containing the documents you want to have in your sitemap.

Custom Sophora Sitemap mapping classes

To use custom Sophora Sitemap mapping classes you need to implement the com.subshell.sophora.delivery.sitemap.api.IMapperFactory interface. The class should be located in the package specified by the property sophora.delivery.sitemap.basePackage.

The Mapperfactory has two main purposes:

  1. It defines from which solr core the documents should be read to convert them to entries in your Sitemap
  2. It provides your mapper implementations. There are four mappers to provide. Three for the google extensions for news, video and image and one for all the other documents.

Each mapper is then used to provide properties based on a solr document. Based on the mapping the xml will be generated. For convenience it is possible to use the com.subshell.sophora.delivery.sitemap.impl.AbstractMapper class:

public class CustomUrlMapper extends AbstractMapper implements IUrlMapper {

	public static final String SOLR_FIELD_URL = "url_s";
	public static final String SOLR_FIELD_LAST_MOD = "sophora_modificationDate_dt";

	public DefaultUrlMapper(Map<String, Object> solrDocument) {
		super(solrDocument);
	}

	@Override
	public boolean isApplicable() {
		return true;
	}

	@Override
	public String getLocation() {
		return Objects.toString(getSolrDocument().get(SOLR_FIELD_URL));
	}

	@Override
	public DateTime getLastMod() {
		return parseDate(getSolrDocument(), SOLR_FIELD_LAST_MOD);
	}

	@Override
	public ChangeFreq getChangefreq() {
		return ChangeFreq.DAILY;
	}

	@Override
	public BigDecimal getPriority() {
		return null;
	}
}

There is a DefaultMapperFactory providing DefaultMapper-Implementations.

Properly writing custom mappers

For every solr document there should be only one mapper implementation that is applicable. The easiest way to achive this is by making a clear distinction by the solr documents nodetype. The default implementation for example always create default URL-entries and never any google extensions.

Generating the sitemap xml

After configuring the module as described above, visiting an index document using the template type "sitemap" creates the sitemap index and returnes the link to the generated sitemap xml.

Caching

If caching is enabled using the Sophora property sophora.delivery.cache.enabled the sitemap xml cache will periodically update. The update interval is configurable via sophora.delivery.sitemap.cacheUpdateInterval.

Paging

This module supports paging using the url parameter p. A typical url with paging looks like this (Here, the fourth page at index 3 is used):
http://my-site.subshell.com/live/demosite/trendcities/copenhagen/index~sitemap_p-3.xml

Properties

PropertyDescription
sophora.delivery.sitemap.basePackageBase Java package to search the implementation of the IMapperFactory in. (Default: com.subshell.sophora.delivery.sitemap)
sophora.delivery.sitemap.cacheUpdateIntervalUpdate interval in minutes to invalidate and regenerate the xml. (Default: 30)
sophora.delivery.sitemap.formatXMLIf set to true the xml output will be formatted. Otherwise the xml will be displayed in one line. (Default: true)
sophora.delivery.sitemap.writeNamespacesForThis property controls which namespace-declarations to write at the start of the generated XML-File. You can use it to filter out the namespace declarations for google's sitemap extensions. Possible values are:
  • news (http://www.google.com/schemas/sitemap-news/0.9)
  • video (http://www.google.com/schemas/sitemap-video/1.1)
  • image (http://www.google.com/schemas/sitemap-image/1.1)
You can use several values separated by a comma. The default is news,image,video.
You should not use this unless you have custom mapper classes for any of those types that never generates entries (thus their method isApplicable always returns false).
The namespace-declaration for the sitemaps standard however is always written and not affected by this property.

Last modified on 10/20/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon