Sitemaps protocol
The open sitemaps protocol (https://sitemaps.org/protocol.html) is a human and machine readable xml interface to describe the structure of a website. The sitemaps standard enables search engines to read and understand a website.
Doing so will greatly improve your SEO, as all modern search engines understand the sitemaps protocol. The Sophora Sitemaps module supports version 0.90 of the protocol. It is able to automatically generate the XML based on your website structure and by using customizable mapping classes in Java to provide meta data.
Google extensions
In addition to the open sitemaps standard, this addon supports the google extensions for news (version 0.9), images (version 1.1) and videos (version 1.1). These sitemap extensions and follow-up links are explained at the google support pages (https://support.google.com/webmasters/answer/183668?hl=en&ref_topic=4581190#extensions).
Sitemap-XML containing these extensions looks like this:
<urlset
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>
http://my-site.subshell.com:8080/live/demosite/chronicle/2010/index.html
</loc>
<lastmod>2016-06-06T13:22:34.653+02:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.9</priority>
</url>
...
</urlset>
Project Setup
The Sophora Sitemaps module is a separate maven dependency for your delivery. Installing the module provides you with the new Sitemaps servlet, which then has to be added to your web.xml and templates.xml.
Maven dependency: pom.xml
<dependency>
<groupId>com.subshell.sophora</groupId>
<artifactId>sophora-sitemaps</artifactId>
<version>4.0.0</version>
</dependency>
Servlet mapping: web.xml
<servlet>
<servlet-name>SitemapServlet</servlet-name>
<servlet-class>com.subshell.sophora.delivery.sitemap.SitemapServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>SitemapServlet</servlet-name>
<url-pattern>/system/servlet/sitemap.servlet</url-pattern>
</servlet-mapping>
Sophora template mapping: templates.xml
<nodetype name="sophora-nt:structureNode">
<templateset>
...
<template type="sitemap">/system/servlet/sitemap.servlet</template>
</templateset>
</nodetype>
Preparing a Solr Core
This addon generates the entries for your sitemap by reading all relevant documents from a specific Solr core. Specific converters are used to create those entries from solr documents. Technically any Solr core will do for this purpose, but we recommend using a dedicated solr core just containing the documents you want to have in your sitemap.
Custom Sophora Sitemap mapping classes
To use custom Sophora Sitemap mapping classes you need to implement the com.subshell.sophora.delivery.sitemap.api.IMapperFactory interface. The class should be located in the package specified by the property sophora.delivery.sitemap.basePackage.
The Mapperfactory has two main purposes:
- It defines from which solr core the documents should be read to convert them to entries in your Sitemap
- It provides your mapper implementations. There are four mappers to provide. Three for the google extensions for news, video and image and one for all the other documents.
Each mapper is then used to provide properties based on a solr document. Based on the mapping the xml will be generated. For convenience it is possible to use the com.subshell.sophora.delivery.sitemap.impl.AbstractMapper class:
public class CustomUrlMapper extends AbstractMapper implements IUrlMapper {
public static final String SOLR_FIELD_URL = "url_s";
public static final String SOLR_FIELD_LAST_MOD = "sophora_modificationDate_dt";
public DefaultUrlMapper(Map<String, Object> solrDocument) {
super(solrDocument);
}
@Override
public boolean isApplicable() {
return true;
}
@Override
public String getLocation() {
return Objects.toString(getSolrDocument().get(SOLR_FIELD_URL));
}
@Override
public DateTime getLastMod() {
return parseDate(getSolrDocument(), SOLR_FIELD_LAST_MOD);
}
@Override
public ChangeFreq getChangefreq() {
return ChangeFreq.DAILY;
}
@Override
public BigDecimal getPriority() {
return null;
}
}
There is a DefaultMapperFactory providing DefaultMapper-Implementations.
Properly writing custom mappers
For every solr document there should be only one mapper implementation that is applicable. The easiest way to achive this is by making a clear distinction by the solr documents nodetype. The default implementation for example always create default URL-entries and never any google extensions.
Generating the sitemap xml
After configuring the module as described above, visiting an index document using the template type "sitemap" creates the sitemap index and returnes the link to the generated sitemap xml.
Caching
If caching is enabled using the Sophora property sophora.delivery.cache.enabled the sitemap xml cache will periodically update. The update interval is configurable via sophora.delivery.sitemap.cacheUpdateInterval.
Paging
This module supports paging using the url parameter p. A typical url with paging looks like this (Here, the fourth page at index 3 is used):
http://my-site.subshell.com/live/demosite/trendcities/copenhagen/index~sitemap_p-3.xml
Properties
Property | Description |
---|---|
sophora.delivery.sitemap.basePackage | Base Java package to search the implementation of the IMapperFactory in. (Default: com.subshell.sophora.delivery.sitemap) |
sophora.delivery.sitemap.cacheUpdateInterval | Update interval in minutes to invalidate and regenerate the xml. (Default: 30) |
sophora.delivery.sitemap.formatXML | If set to true the xml output will be formatted. Otherwise the xml will be displayed in one line. (Default: true) |
sophora.delivery.sitemap.writeNamespacesFor | This property controls which namespace-declarations to write at the start of the generated XML-File. You can use it to filter out the namespace declarations for google's sitemap extensions. Possible values are:
news,image,video . You should not use this unless you have custom mapper classes for any of those types that never generates entries (thus their method isApplicable always returns false ).The namespace-declaration for the sitemaps standard however is always written and not affected by this property. |