Google Sitemaps

Kentico allows you to automatically generate sitemaps for your websites according to the Google Sitemap Protocol. Sitemaps help search engines correctly index the content of websites and can have a significant effect on the resulting search ranking.

A sitemap is an XML file that lists the URLs of a website’s pages along with additional metadata. Search engine crawlers (robots) use the sitemap data to determine which pages to index and how often to re-index pages. Sitemaps only serve as a recommendation and do not guarantee that all crawlers will index your website strictly according to the specified data.

For detailed information about the Sitemap protocol, see http://www.sitemaps.org/.

Setting the sitemap URL

To adjust the URL of your website’s sitemap:

  1. Open the Settings application.
  2. Select the URLs and SEO category.
  3. Enter the required URL into the Google sitemap URL setting.
  4. Click Save.

For example, the default value googlesitemap.xml means that web crawlers can access the sitemap through the following URL:

<website domain>/googlesitemap.xml

Using the .xml extension

If you want to have your sitemap available under a URL with the .xml extension, you need to configure your application to handle all types of request extensions:

  1. Edit your application’s web.config file.

  2. Find the system.webServer section directly under the web.config root (i.e. not under a specific <location> element).

  3. Add the following attribute to the <modules> element:

    
    
    
     <modules runAllManagedModulesForAllRequests="true">
    
    
     

Defining the sitemap content

The system generates sitemaps for websites based on the documents stored in the content tree.

By default the sitemap:

  • Only contains pages (documents of the CMS.MenuItem type)
  • Automatically excludes all pages whose parent document is not in the sitemap (such as pages stored under folders or custom document types)

You can modify the content of your website’s sitemap by creating a dedicated sitemap document:

  1. Open the Pages application.
  2. Create a new Page (menu item) document in your website’s content tree.
    • You can use the predefined SEO -> Google Sitemap page template to quickly create sitemap documents. This template contains the required web part by default. You may need to assign the template to your site in the Page templates application.
  3. Place the Google Sitemap (XML Sitemap) web part onto the page.
    • Adding this web part stops the page from displaying standard content. Instead, the page returns an XML response with the sitemap data.
    • The web part only generates output when the page is accessed on the live site.
  4. Configure the content of the sitemap through the web part’s properties.
    • You can limit which documents are included in the sitemap by entering an appropriate Path expression.
  5. Open the Settings application and select the URLs and SEO category.
  6. Enter the path of your sitemap document into the Google sitemap path setting.
  7. Click Save.

The sitemap generated by the web part replaces the default sitemap. Search crawlers can access the sitemap either under the main URL specified in the Google sitemap URL setting, or directly through the URL of the document containing the Google Sitemap web part.

Customizing the default sitemap directly

If you do not wish to use portal engine pages and web parts, you can instead edit the markup of the ~/CMSPages/googlesitemap.aspx system page. This page generates the default sitemap for websites that have an empty Google sitemap path setting.

The GoogleSitemap control on the page provides the same configuration options as the Google Sitemap web part.

Troubleshooting:

If you encounter problems with pages missing in your sitemap, try checking for the following:

  • Manually excluded documents - specific documents may be excluded through their sitemap properties (Show in sitemap or Exclude from search).
  • Incorrect content filtering - review the content filtering properties of your Google Sitemap web part.
    • If the Document types property is empty, the sitemap only loads pages (CMS.MenuItem documents). Add the document types that you wish to have in the sitemap. You can use the asterisk (*) wildcard to specify all document types.
  • Broken document hierarchy - sections of the website may be excluded due to parent documents missing in the sitemap. To load all documents regardless of the parent‑child hierarchy in the content tree, disable the Hide children for hidden parent property of the Google Sitemap web part.

Configuring sitemap settings for specific pages

By filling in the sitemap properties of documents, you can exclude specific pages from sitemaps or give search crawlers additional details describing how to index pages:

  1. Open the Pages application.

  2. Select the document in the content tree.

  3. Open the document’s Properties -> Navigation tab.

  4. Set up the following properties:

    Basic properties

    Show in sitemap

    Sitemaps only list documents that have this property enabled.

    Search & SEO

    Exclude from search

    Marks the document to be ignored by all forms of search, including search engines.

    Enabling this checkbox excludes the document from sitemaps by default. However, individual Google Sitemap web parts can override this setting and generate sitemaps including documents that are excluded from search.

    Sitemap change frequency

    Determines the value of the document’s <changefreq> tag in the sitemap. This metadata provides a suggestion to search engines about how often they should re-index the page.

    Choose a value that reflects how frequently the page’s content changes.

    Sitemap priority

    Allows you to inform web crawlers which pages you consider to be the most important.

    The system converts the selected priority to a decimal number between 0 and 1 and adds the number as the value of the document’s <priority> tag in the sitemap. Web crawlers only measure the priority in relation to other pages on the website.

If you enter the URL of the sitemap into your browser, you can review the generated XML output. The system automatically creates the required XML structure.

XML output of the Google sitemap generated for a Kentico website

  • The <url> elements represent individual pages.
  • The sitemap loads the values of the <loc> and <lastmod> tags from the data of the corresponding documents.
  • The <changefreq> and <priority> optional tags are added for documents that have values in their Sitemap change frequency and Sitemap priority properties.

Creating sitemap indexes

A single XML sitemap can only list up to 50 000 pages (URLs). If you need to include more pages, prepare multiple sitemaps and create a sitemap index for your website:

  1. Add any number of sitemap documents, each one containing its own Google Sitemap web part.
  2. Separate your website’s pages between the sitemaps by configuring the content filtering properties of the web parts.
    • Each sitemap can contain a maximum of 50 000 items.
    • Avoid duplicate content — do not list the same page URLs in multiple sitemaps.
  3. Create the index as another document with a Google Sitemap web part.
    • Switch the Sitemap mode property of the web part to Sitemap index.
    • Configure the content filtering properties so that the sitemap index web part loads only the documents representing your sitemaps.
  4. Enter the path of your sitemap index document into the website’s Google sitemap path setting.
    • This ensures that crawlers process the sitemap index first.

The sitemap index points search engine crawlers to the other sitemaps, which then provide the lists of page URLs in the usual way.

Customizing the XML format of sitemaps

If you need to override the default XML format of a sitemap or index, you can specify a custom transformation for the Google Sitemap web part or GoogleSitemap control. This allows you to react to any changes in the Sitemap protocol.

For example, the default CMS.Root.GoogleSiteMap transformation uses the following code to define the sitemap structure:




<url>
  <%# GetSitemapItem("loc") %>
  <%# GetSitemapItem("lastmod") %>
  <%# GetSitemapItem("changefreq") %>
  <%# GetSitemapItem("priority") %>
</url>


The GetSitemapItem transformation method generates XML tags according to the sitemap protocol. The method’s parameter specifies the type of the tag, and the value is dynamically loaded from the data of the transformed documents.

In the final XML output, the web part automatically encloses the transformed items within either a <urlset> or <sitemapindex> element depending on the selected Sitemap mode.

Creating sitemaps for multilingual websites

Websites that offer content in multiple languages use different URLs for different language versions of the same document.

To include pages from all languages in your sitemap, you need to use one of the approaches described below.

Submitting separate sitemaps for different language versions of the website

One way to handle multilingual content is to present a separate sitemap for each language version of the website. This approach works best if your website is fully translated into all supported languages.

You do not need to create multiple sitemaps. When submitting your sitemap to search engines, add the appropriate language modifiers to the sitemap URL. The format must match the URL structure that you use to identify language versions on your website.

See Configuring URLs for multilingual websites for detailed information about the available options.

Multilingual URL structure

Description

Examples of full sitemap URLs

Separate domain names

Use the appropriate domain name when submitting the sitemap URL.

The sitemap automatically loads the page URLs for the language that matches the domain name.

  • http://domain.com/googlesitemap.xml

  • http://domain.fr/googlesitemap.xml

URL language prefixes

Include the language prefix when submitting the sitemap URL.

  • http://domain/en-us/googlesitemap.xml

  • http://domain/fr-fr/googlesitemap.xml

URL language parameters

Add the language query string parameter when submitting the sitemap URL.

Warning: This approach is not recommended. If your website does not use culture-specific domains or language prefixes, URL conflicts may occur between the different language versions of the same document.

  • http://domain/googlesitemap.xml?lang=en-us

  • http://domain/googlesitemap.xml?lang=fr-fr

The sitemap automatically lists the pages from the language version of the website specified by the full sitemap URL.

Important

To ensure that the sitemap dynamically presents pages for the language determined by the sitemap URL, you need to leave the Culture code property of your Google Sitemap web part with an empty value.

Preparing one sitemap for all languages used on the website

You can list the URLs of the pages from all language versions of the website inside a single sitemap. This approach is most suitable for websites that are not fully available in the non-default languages, i.e. if only a limited number of documents is translated.

  1. Create a separate sitemap document for each language, each one containing its own Google Sitemap web part.

    • Assign the language through the Culture code property of the web part.
    • Set the Combine with default culture property to No (prevents duplicate entries if your website uses the default content for untranslated documents).
  2. Add a sitemap index document and configure it to load the language-specific sitemap documents.

Each sitemap document lists the URLs of the pages available for a particular language. The sitemap index combines the output of all language-specific sitemaps. You can then submit the URL of the sitemap index to search engines without any language modifiers.