XML Sitemap and Robots.TXT Guide for SEO Beginners


This can serve as XML Sitemap and Robots.txt guide for SEO beginners.

XML Sitemap and Robots.TXT Guide for SEO Beginners

  1. 1. robots.txt and sitemap.xmlPRACTICAL GUIDE FOR SEO BEGINNERS
  2. 2. SEO BeginnersROBOTS.TXT
  3. 3. WHAT ARE WEB ROBOTS? Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
  4. 4. WHAT IS ROBOTS.TXT? Robots.txt is a plain text file that you upload to the root directory of your site. Once the web spiders (ants, bots, indexers) that index your webpage search your site, they first look at that text file and process it. Put differently, robots.txt says to the spider which pages to crawl.
  5. 5. THE SIMPLEST VERSION OF ROBOTS.TXTUser-agent: *Disallow: The first line “user agent asterisk” indicates that the following lines apply to all agents. Space after "disallow:" means that nothing is limited. This robots.txt file does nothing it allows all types of robots to see everything on the site.
  6. 6. SOME MORE EXAMPLES OF ROBOTS.TXT To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: (or just create an empty "/robots.txt" file, or dont use one at all)
  7. 7. SOME MORE EXAMPLES OF ROBOTS.TXT To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/ To exclude a single robot User-agent: BadBot Disallow: /
  8. 8. SOME MORE EXAMPLES OF ROBOTS.TXT To allow a single robot User-agent: Googlebot Disallow: User-agent: * Disallow: / You can disallow single pages: User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
  9. 9. SOME MORE EXAMPLES OF ROBOTS.TXT You can specify the Sitemap location in your robots.txt file User-agent: * Disallow: / Sitemap:
  10. 10. ABOUT THE ROBOTS <META> TAG You can use a special HTML <META> tag to tell robots not to index the content of a page, and/or not scan it for links to follow. <html> <head> <title>...</title> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> </head>
  11. 11. SEO BeginnersSITEMAP.XML
  12. 12. WHAT ARE SITEMAPS? Tells search engines which pages are available for crawling. A Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL.  when it was last updated  how often it usually changes  how important it is, relative to other URLs in the site
  13. 13. SITEMAPS XML FORMAT The Sitemap must:  Begin with an opening <urlset> tag and end with a closing </urlset> tag.  Specify the namespace (protocol standard) within the <urlset> tag.  Include a <url> entry for each URL, as a parent XML tag.  Include a <loc> child entry for each <url> parent tag.  All URLs in a Sitemap must be from a single host, such as or  Sitemap file must be UTF-8 encoded  No more than 50,000 URLs  File must not be larger than 10MB
  14. 14. SAMPLE XML SITEMAP <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns=""> <url> <loc></loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
  15. 15. USING SITEMAP INDEX FILES (TO GROUPMULTIPLE SITEMAP FILES) The Sitemap index file must:  Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.  Include a <sitemap> entry for each Sitemap as a parent XML tag.  Include a <loc> child entry for each <sitemap> parent tag.  The optional <lastmod> tag is also available for Sitemap index files. Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, can include Sitemaps on but not on or
  16. 16. SAMPLE XML SITEMAP INDEX <?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns=""> <sitemap> <loc></loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc></loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
  17. 17. SITEMAP FILE LOCATION The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at can include any URLs starting with but can not include URLs starting with