Sitemap.xml File

Table of Contents

What is Sitemap.xml File

A sitemap.xml tells Google which pages and files you think are important in your site, and also provides valuable information about these files. For example, when the page was last updated and any alternate language versions of the page. How often can Google crawl the link by adding priority level from 0.1 to 1. Search engines like Google read this file to crawl your website more efficiently.

Newspaper agencies for example, use sitemap to have Google crawl their website every minute or every time when a new URL is added. When a news brakes, that’s how share link with Google requesting instant index and news becoming available and searchable on Google.

Crawling and ranking isn’t directly related. You can use a sitemap to provide information about specific types of content on your pages, including video, image, and news content. For example:

  • A sitemap image entry can include the location of the images included in a page.
  • A sitemap video entry can specify the video rating, running time, and age-appropriateness rating.
  • A sitemap news entry can include the article title and publication date

Does your webiste need a sitemap

Google can crawl and index your website even if your website does not have a sitemap. You can have a website without sitemap, Google still has the ability to crawl and index. Sitemap is more critical for websites that are large or complicated. Websites that are not common or the typical.

Sitemap works like a  navigation map. Sitemap is the navigation giving directions to Google Bot, where the key site pages are. The new pages added. Contact, About page and other key website details.

 Proper linking means that all pages that are important for your business can be reached through some form of navigation, be that your site’s menu or links that you placed on blogs or pages. Of course, sitemap improve the crawling of larger or more complex sites, or more specialized files.

The purpose of sitemap is to help search engines discover URLs on your website, but Google does not guarantee all the items in your sitemap will be crawled and indexed. However, in most cases, your site will benefit from having a sitemap.

Your website needs a sitemap if

  • Your website is really large.  As a result, it’s more likely Google web crawlers might overlook crawling some of your new or recently updated pages.
  • Your site has a large archive of content pages that are isolated or not well-linked to each other. 
  • If your site pages don’t naturally reference each other or Have poor internal linking, you can list them in a sitemap to ensure that Google doesn’t overlook some of your pages.
  • Google bots and other web crawlers crawl the web by following links from one page to another. If your website is new and has fewer links, Google might not discover your pages if no other sites link to them.
  • Your site has a lot of rich media content (video, images) or is shown in Google NewsIf provided, Google can take additional information from sitemaps into account for search, where appropriate.

Sitemap Tips

  • Use a preferred URL format, like with “www,” for your XML sitemap
  • Include the correct protocol, like “http” or “https” in your XML sitemap
  • Add a link to your XML sitemap in your website’s footer

Your website does not need a sitemap if

  • Your site is “small”. By small, Google mean about 500 pages or fewer on your site. (Only public facing links count toward this total.)
  • Your site is comprehensively linked internally. This means that Google can find all the important pages on your site by following links starting from the homepage.
  • You don’t have many media files (Images, Videos, Giff’s) or news pages that you want to show in search results. One of the positives Sitemaps have is to help Google find and understand video and image files, or news articles, on your site. If your business does not require the video and images appear in Google results, you might not need a sitemap.

 

Sitemap.xml File

Learn how you can create a sitemap

  • This page describes how to build a sitemap and make it available to Google. Learn more about sitemaps here.

    1. Decide which sitemap format you want to use.
    2. Create the sitemap, either automatically or manually.
    3. Make your sitemap available to Google.

Sitemap Formats

Google supports several sitemap formats:

  • XML
  • RSS, mRSS, and Atom 1.0
  • Text

Google accepts the standard sitemap protocol in all formats. Recently, Google announced it will not be taking into consideration the priority tag used in the Standard Sitemap Protocol. There are multiple reasons for the stand. More importantly, The tag does not help Google to be efficient in crawling website pages. For example, If a webmaster selects the priority and signals the website content changing every day or week. If in reality, the website does not change daily or weekly, Google crawl resources are used not efficiently. Instead, Google prefers to crawl the page, only when changes are made. 

All formats limit a single sitemap to 50MB (uncompressed) or 50,000 URLs. If you have a larger file or more URLs, Google suggests to break your list into multiple sitemaps. You can optionally create a sitemap index file and submit that single index file to Google. We can submit multiple sitemaps and/or sitemap index files to Google.

XML

See below a very basic XML sitemap that includes the location of a single URL:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 
<url>
   
<loc>https://www.example.com/foo.html</loc>
   
<lastmod>2022-06-04</lastmod>
 
</url>
</urlset>

Please visit sitemaps.org  to find more complex examples and full documentation. 

RSS, mRSS, and Atom 1.0

If you have a blog with an RSS or Atom feed, you can submit the feed’s URL as a sitemap.

The majority of the blog software creates a feed for you, but the feed only provides information on recent URLs.

  • Google accepts RSS 2.0 and Atom 1.0 feeds.
  • You can use an mRSS (media RSS) feed to provide Google details about video content on your site.

Text

If your sitemap includes only web page URLs, you can provide Google with a simple text file that contains one URL per line. For example:

Guidelines for text file sitemaps

  • In the sitemap file, do not put anything other than URLs.
  • Encode your file using UTF-8 encoding.
  • Please use .txt as the file extension name. You are free to name anything you want.

Sitemap extensions for additional media types

  • Google supports extended sitemap syntax for Videos, Images and Google News.
  • Use these extensions to describe video files, images, and other hard-to-parse content on your site to improve indexing.

Submit your sitemap.xml File Link via Google Search Console

Google doesn’t check a sitemap every time a site is crawled; a sitemap is checked only the first time that we notice it, and thereafter only when you ping us to let us know that it’s changed. Alert Google about a sitemap only when it’s new or updated; don’t submit or ping unchanged sitemaps multiple times.

If you have updated pages in the sitemap, mark them with the <lastmod> field. Other XML files have a similar field, such as <updated> for Atom XML. You can also learn how to compute this date.

Multiple ways you can submit your website sitemap

  • Submit a sitemap in Search Console using the Sitemaps report. This will allow you to see when Googlebot accessed the sitemap and also potential processing errors.
  • Use the Search Console API to programmatically submit a sitemap.
  • Use the ping tool. Send a GET request in your browser or the command line to this address, specifying the full URL of the sitemap. Be sure that the sitemap file is accessible:

https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP

Example:

https://www.google.com/ping?sitemap=https://example.com/sitemap.xml 

  • Insert the following line anywhere in your robots.txt file, specifying the path to your sitemap. We will find it the next time we crawl your robots.txt file:

            Sitemap: https://example.com/my_sitemap.xml 

  • Use WebSub if you use Atom/RSS for your sitemap and want to broadcast your changes to other search engines in addition to Google. 
  • Submitting a sitemap is merely a hint: it doesn’t guarantee that Google will download the sitemap or use the sitemap for crawling URLs on the site.

Multiple ways you can submit your website sitemap

 
  • A sitemap can be posted anywhere on the site, but a sitemap affects only descendants of the parent directory. Sitemap posted at the site root can affect all files on the site, which is where we recommend posting your sitemaps.
 
  • To reduce duplicate crawling of the URLs, Don’t include session IDs and other user-dependent identifiers from URLs in your sitemap.
 
  • Sitemap files must be UTF-8 encoded, and URLs escaped appropriately.
 
 
  • Break up large sitemaps into smaller sitemaps: a sitemap can contain up to 50,000 URLs and must not exceed 50MB uncompressed. Use a sitemap index file to list all the individual sitemaps and submit this single file to Google rather than submitting individual sitemaps.
 
  • Use sitemap extensions for pointing to additional media types such as video, images, and news.
 
  • List only canonical URLs in your sitemaps. If you have multiple versions of a page, list in the sitemap only the one you prefer to appear in search results. If you have multiple versions of your site (for example, www and non-www), decide which is your preferred site, and put the sitemap there, and add rel=canonical or redirects on the other site.
 
  • If you have different URLs for mobile and desktop versions of a page, we recommend pointing to only one version in a sitemap. However, if you want to point to both URLs, annotate your URLs to indicate the desktop and mobile versions.
 
 
  • Sitemaps are a recommendation to Google about which pages you think are important; Google does not pledge to crawl every URL in a sitemap.
 
  • Google ignores <priority> and <changefreq> values.
 
  • Google uses the <lastmod> value if it’s consistently and verifiably (for example by comparing to the last modification of the page) accurate.
 
  • The position of a URL in a sitemap is not important; Google does not crawl URLs in the order in which they appear in your sitemap.
  • Non-alphanumeric and non-latin characters. We require your sitemap file to be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the following table. A sitemap can contain only ASCII characters; it can’t contain extended ASCII characters or certain control codes or special characters such as * and {}. If your sitemap URL contains these characters, you’ll receive an error when you try to add it.

Character

Symbol

Escape Code

Ampersand

&

&amp;

Single Quote

&apos;

Double Quote

&quot;

Greater Than

> 

&gt;

Less Than

< 

&lt;

  • In addition, all URLs (including the URL of your sitemap) must be encoded for readability by the web server on which they are located and URL-escaped. However, if you are using any sort of script, tool, or log file to generate your URLs (anything except typing them in by hand), this is usually already done for you. If you submit your sitemap and you receive an error that Google is unable to find some of your URLs, check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.
 
  • Here is an example of a URL that uses a non-ASCII character (ü), as well as a character that requires entity escaping (&):
 
  • https://www.example.com/ümlat.html&q=name
 
  • Here is that same URL encoded using ISO-8859 encoding, and with the entity escaped:
 
  • https://www.example.com/%FCmlat.html&amp;q=name
 
  • Here is that same URL using UTF-8 encoding, and with the entity escaped:
 
  • https://www.example.com/%C3%BCmlat.html&amp;q=name

About the Author

Zabi Niazi - Director of Search Marketing SEM and SEO

Hands-on execution & Revenue-focused digital marketer with expertise in Design & Operations centered around people, processes & technology engineering a Demand-Gen Engine capable of delivering innovative experiences that tell the brand story and map to the buyer's journey generating awareness, acquisition, retention, and advocacy.