Clicky

Technical

What is a sitemap? The XML sitemap and robots.txt file

Posted on
March 1, 2022
Segment
Marketing Team
SEO plans from $1,199/month

All the value you need from an SEO agency. Request a free audit and proposal to uncover SEO gaps across your site.

GET STARTED TODAY

As anyone who understands the complexities of search engine optimization (SEO) will know, an important part of improving a website’s search engine ranking involves allowing its pages to be crawled and indexed by something known as a bot or robot.

These bots are designed to ascertain the usability and relevance of certain websites for search engine users, delivering and ranking results according to their findings.

To ensure that your website is giving bots all of the information they need to read your website quickly and effectively, you will need to focus on two important files: the robots.txt file and the XML sitemap.

What is a robots.txt file?

This file is what is known as a crawl directive. In other words, it instructs search engine bots to crawl the website in a particular way, following a strict syntax. The purpose of the file is to tell search engines which URLs they are allowed to index on a given website.

It is important for companies to include this file on their websites as it is the first thing a bot will look for on entering your site. Even if you are keen for a bot to crawl all of your site pages, you will need a default robots.txt file to direct it in a way that benefits your SEO.

Where should the robots.txt file be placed?

The robot.txt should always sit at the root of the website domain. In other words, if you own a website like https://www.segmentseo.com, the file will be found at https://www.segmentseo.com/robots.txt.

What are the benefits of using robots.txt?

Generally speaking, search bots enter a website with a predetermined “allowance” that dictates how many pages it is allowed to crawl based on the size or reputation of the site. This is also known as the “crawl budget”.

With the robot.text file, you can block search engines from entering problematic parts of your site that may not be adequately optimized for marketing purposes. If you decide to optimize the content at a later date, you are free to edit the file and allow them back in.

What is an XML sitemap?

The XMP sitemap is a document that lists every page on a website. Designed for the benefit of search engines, it offers information about the kind of content available and how it can be reached. Sometimes, it also provides information about when individual pages were last updated and how important certain parts of the site are for users or potential customers.

Put simply, an XML sitemap offers search engines a snapshot of all available website content at once. This allows them to find newly added content at lightning speed and makes the scope of the site easy to understand or follow.

It is important not to confuse XML sitemaps with the sitemaps that average web users come across on a regular basis (known as HTML sitemaps). Whilst the latter is designed to instruct website users, the former is purely for use by search engines.

Why are XML sitemaps important?

XML sitemaps are an efficient way to alert search engines when you create or update content. Indeed, they are a vital part of any decent SEO strategy, particularly if your website boasts a large number of pages.

Where should the XML sitemap reside?

It is important that you place your XML sitemap on a dedicated URL. Typically, website developers place the sitemap at, for example, https://www.segmentseo.com/sitemap.xml. However, if this is not possible, you can allocate a different location so long as it is referenced in your robots.txt file via the sitemap-directive. This will ensure that it remains discoverable by search engines.

Explore more updates

Marketing
October 31, 2022

Netflix, marketing and 'Stranger Things'

Metrics
January 10, 2023

Measure your website with these SEO metrics

Technical
March 22, 2023

404 'Page Not Found' error: How to find and fix broken links