Your SlideShare is downloading. ×
robots.txt and sitemap.xmlPRACTICAL GUIDE FOR SEO BEGINNERS
SEO BeginnersROBOTS.TXT
WHAT ARE WEB ROBOTS?   Web Robots (also known as Web Wanderers,    Crawlers, or Spiders), are programs that    traverse t...
WHAT IS ROBOTS.TXT?   Robots.txt is a plain text file that you upload to    the root directory of your site. Once the web...
THE SIMPLEST VERSION OF ROBOTS.TXTUser-agent: *Disallow:   The first line “user agent asterisk” indicates    that the fol...
SOME MORE EXAMPLES OF ROBOTS.TXT   To exclude all robots from the entire server    User-agent: *    Disallow: /   To all...
SOME MORE EXAMPLES OF ROBOTS.TXT   To exclude all robots from part of the server    User-agent: *    Disallow: /cgi-bin/ ...
SOME MORE EXAMPLES OF ROBOTS.TXT   To allow a single robot    User-agent: Googlebot    Disallow:    User-agent: *    Disa...
SOME MORE EXAMPLES OF ROBOTS.TXT   You can specify the Sitemap location in your    robots.txt file    User-agent: *    Di...
ABOUT THE ROBOTS <META> TAG   You can use a special HTML <META> tag to tell    robots not to index the content of a page,...
SEO BeginnersSITEMAP.XML
WHAT ARE SITEMAPS? Tells search engines which pages are available  for crawling. A Sitemap is an XML file that lists URL...
SITEMAPS XML FORMAT   The Sitemap must:     Begin with an opening <urlset> tag and end with a      closing </urlset> tag...
SAMPLE XML SITEMAP   <?xml version="1.0" encoding="UTF-8"?>   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9...
USING SITEMAP INDEX FILES (TO GROUPMULTIPLE SITEMAP FILES)   The Sitemap index file must:       Begin with an opening <s...
SAMPLE XML SITEMAP INDEX   <?xml version="1.0" encoding="UTF-8"?>   <sitemapindex xmlns="http://www.sitemaps.org/schemas...
SITEMAP FILE LOCATION   The location of a Sitemap file determines the    set of URLs that can be included in that    Site...
THANK YOU                                   ADITYA TODAWAL                         PROJECT COORDINATOR (SEO)SEARCH RESULTS...
Upcoming SlideShare
Loading in...5
×

XML Sitemap and Robots.TXT Guide for SEO Beginners

18,501

Published on

I created this PPT for SEO trainees. This can serve as XML Sitemap and Robots.txt guide for SEO beginners.

Published in: Technology
5 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
18,501
On Slideshare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
0
Comments
5
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "XML Sitemap and Robots.TXT Guide for SEO Beginners"

  1. 1. robots.txt and sitemap.xmlPRACTICAL GUIDE FOR SEO BEGINNERS
  2. 2. SEO BeginnersROBOTS.TXT
  3. 3. WHAT ARE WEB ROBOTS? Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
  4. 4. WHAT IS ROBOTS.TXT? Robots.txt is a plain text file that you upload to the root directory of your site. Once the web spiders (ants, bots, indexers) that index your webpage search your site, they first look at that text file and process it. Put differently, robots.txt says to the spider which pages to crawl.
  5. 5. THE SIMPLEST VERSION OF ROBOTS.TXTUser-agent: *Disallow: The first line “user agent asterisk” indicates that the following lines apply to all agents. Space after "disallow:" means that nothing is limited. This robots.txt file does nothing it allows all types of robots to see everything on the site.
  6. 6. SOME MORE EXAMPLES OF ROBOTS.TXT To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: (or just create an empty "/robots.txt" file, or dont use one at all)
  7. 7. SOME MORE EXAMPLES OF ROBOTS.TXT To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/ To exclude a single robot User-agent: BadBot Disallow: /
  8. 8. SOME MORE EXAMPLES OF ROBOTS.TXT To allow a single robot User-agent: Googlebot Disallow: User-agent: * Disallow: / You can disallow single pages: User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
  9. 9. SOME MORE EXAMPLES OF ROBOTS.TXT You can specify the Sitemap location in your robots.txt file User-agent: * Disallow: / Sitemap: http://www.example.com/sitemap.xml
  10. 10. ABOUT THE ROBOTS <META> TAG You can use a special HTML <META> tag to tell robots not to index the content of a page, and/or not scan it for links to follow. <html> <head> <title>...</title> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> </head>
  11. 11. SEO BeginnersSITEMAP.XML
  12. 12. WHAT ARE SITEMAPS? Tells search engines which pages are available for crawling. A Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL.  when it was last updated  how often it usually changes  how important it is, relative to other URLs in the site
  13. 13. SITEMAPS XML FORMAT The Sitemap must:  Begin with an opening <urlset> tag and end with a closing </urlset> tag.  Specify the namespace (protocol standard) within the <urlset> tag.  Include a <url> entry for each URL, as a parent XML tag.  Include a <loc> child entry for each <url> parent tag.  All URLs in a Sitemap must be from a single host, such as www.example.com or store.example.com.  Sitemap file must be UTF-8 encoded  No more than 50,000 URLs  File must not be larger than 10MB
  14. 14. SAMPLE XML SITEMAP <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
  15. 15. USING SITEMAP INDEX FILES (TO GROUPMULTIPLE SITEMAP FILES) The Sitemap index file must:  Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.  Include a <sitemap> entry for each Sitemap as a parent XML tag.  Include a <loc> child entry for each <sitemap> parent tag.  The optional <lastmod> tag is also available for Sitemap index files. Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com.
  16. 16. SAMPLE XML SITEMAP INDEX <?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
  17. 17. SITEMAP FILE LOCATION The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
  18. 18. THANK YOU ADITYA TODAWAL PROJECT COORDINATOR (SEO)SEARCH RESULTS MEDIA – INTERNET MARKETING TORONTO

×