SlideShare a Scribd company logo
1 of 6
Download to read offline
Robots.txt File 
What is robots.txt? ๏‚ท The robots.txt is a simple text file in your web site that inform search engine bots how to crawl and index website or web pages. ๏‚ท By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site. ๏‚ท Declaring a robots.txt means that visitors (bots) are not allowed to index sensitive data but it doesnโ€™t mean that they canโ€™t. The legal/good bots follow what is instructed to them but the Malware robots donโ€™t care about it, so donโ€™t try to use it as a security for your web site. How to build a robots.txt file (Terms, Structure & Placement)? The terms used in robots.txt and their meanings are given in tabular format. 
The robots.txt is usually placed in the root folder of your web site so that the URL of your robots.txt file resembles www.example.com/robots.txt in the web browser. Remember that you use all the lower case letter for the filename.
Robots.txt File 
You can define different restrictions to different bots by applying bot specific rules but be aware that the more you make it complicated; it becomes harder for you to understand its traps. Always specify bot specific rules before specifying common rules so that bots read the file till the end to find rules specific to their names or else follow common rules. You can check our many other sites robots.txt to get a feel on how these are generally implemented. http://www.searchenabler.com/robots.txt http://www.google.com/robots.txt http://searchengineland.com/robots.txt Example scenarios for robots.txt If you have a close look at Search Enabler robots.txt, you can notice that we have blocked following pages from search indexing. You can analyze which pages and links should be blocked from your website. On a general note we advice hiding pages such as search results page within your web site and user logins, profiles, logs and styling CSS sheets. 1. Disallow: /?s= It is a dynamic search results page and there is no point in indexing it which will create duplicate content problems. 2. Disallow: /blog/2010/ These are the blogs categorized in a year wise patterns and are blocked because they lead to duplication errors with different URLs pointing to the same web page. 3. Disallow: /login/ It is a login page meant only for users of searchenabler tool so it is blocked from getting crawled. How does robots.txt affect search results? By using the robots.txt file, you can hide the pages such as user profiles and other temp folders from being indexed and does not divulge your SEO effort into junk or the pages which are useless for the search results. In general, you results will be more precise and better valued.
Robots.txt File 
Default Robots.txt Default Robots.txt file basically tells every crawler that it is allowed any web site directory to its heart content: User-agent: * Disallow: (which translates as โ€œdisallow nothingโ€) The often asked question here is why to use it at all. Well, it is not required but recommended to use for the simple reason that search bots will request it anyway (this means youโ€™ll see 404 errors in your log files from bots requesting your non-existent Robots.txt page). Besides, having a default Robots.txt will ensure there wonโ€™t be any misunderstandings between your site and a crawler. Robots.txt Blocking Specific Folders / Content: The most common usage of Robots.txt is to ban crawlers from visiting private folders or content that gives them no additional information. This is done primarily in order to save the crawlerโ€™s time: bots crawl on a budget โ€“ if you ensure that it doesnโ€™t waste time on unnecessary content, it will crawl your site deeper and quicker. Samples of Robots.txt files blocking specific content (note: I highlighted only a few most basic cases): User-agent: * Disallow: /database/ (blocks all crawlers from /database/ folder ) User-agent: * Disallow: /*? (blocks all crawlers from all URLโ€™s containing ? ) User-agent: * Disallow: /navy/ Allow: /navy/about.html (blocks all crawlers from /navy/ folder but allow access to one page from this folder) Note from John Mueller commenting below: The โ€œAllow:โ€ statement is not a part of the robots.txt standard (it is however supported by many search engines, including Google)
Robots.txt File 
Robots.txt Allowing Access to Specific Crawlers Some people choose to save bandwidth and allow access to only those crawlers they care about (e.g. Google, Yahoo and MSN). In this case, Robots.txt file should list those Robots followed by the command itself, etc: User-agent: * Disallow: / User-agent: googlebot Disallow: User-agent: slurp Disallow: User-agent: msnbot Disallow: (the first part blocks all crawlers from everything, while the following 3 blocks list those 3 crawlers that are allowed to access the whole site) Need Advanced Robots.txt Usage? I tend to recommend people to refrain from doing anything too tricky in their Robots.txt file unless they are 100% knowledgeable in the topic. Messed-up Robots.txt file can result in screwed project launch. Many people spend weeks and months trying to figure why there site is ignored by crawlers until they realize (often with some external help) that they have misused their Robots.txt file. The better solution for controlling crawler activity might be to get away with on-page solutions (robots meta tags). Aaron did a great job summing up the difference in his guide(bottom of the page).
Robots.txt File 
Best Robots.txt Tools: Generators and Analyzers While I do not encourage anyone to rely too much on Robots.txt tools (you should either make your best to understand the syntax yourself or turn to an experienced consultant to avoid any issues), the Robots.txt generators and checkers I am listing below will hopefully be ofadditional help: Robots.txt generators: Common procedure: 1. choose default / global commands (e.g. allow/disallow all robots); 2. choose files or directories blocked for all robots; 3. choose user-agent specific commands: 1. choose action; 2. choose a specific robot to be blocked. As a general rule of thumb, I donโ€™t recommend using Robots.txt generators for the simple reason: donโ€™t create any advanced (i.e. non default) Robots.txt file until you are 100% sure you understand what you are blocking with it. But still I am listing two most trustworthy generators to check: ๏‚ท Google Webmaster tools: Robots.txt generator allows to create simple Robots.txt files. What I like most about this tool is that it automatically adds all global commands to each specific user agent commands (helping thus to avoid one of the most common mistakes): 
๏‚ท SEObook Robots.txt generator unfortunately misses the above feature but it is really easy (and fun) to use:
Robots.txt File 
Robots.txt checkers: ๏‚ท Google Webmaster tools: Robots.txt analyzer โ€œtranslatesโ€ what your Robots.txt dictates to the Googlebot: 
๏‚ท Robots.txt Syntax Checker finds some common errors within your file by checking for whitespace separated lists, not widely supported standards, wildcard usage, etc. ๏‚ท A Validator for Robots.txt Files also checks for syntax errors and confirms correct directory paths.

More Related Content

What's hot

IMPORTANCE OF HAVING A WEBSITE PRESENTATION
IMPORTANCE OF HAVING A WEBSITE PRESENTATIONIMPORTANCE OF HAVING A WEBSITE PRESENTATION
IMPORTANCE OF HAVING A WEBSITE PRESENTATION
Johndigital maina
ย 

What's hot (20)

Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
ย 
Search Engine
Search EngineSearch Engine
Search Engine
ย 
Website Analysis Report
Website Analysis ReportWebsite Analysis Report
Website Analysis Report
ย 
IMPORTANCE OF HAVING A WEBSITE PRESENTATION
IMPORTANCE OF HAVING A WEBSITE PRESENTATIONIMPORTANCE OF HAVING A WEBSITE PRESENTATION
IMPORTANCE OF HAVING A WEBSITE PRESENTATION
ย 
Web scraping
Web scrapingWeb scraping
Web scraping
ย 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
ย 
Seo Analysis Report
Seo Analysis ReportSeo Analysis Report
Seo Analysis Report
ย 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
ย 
On page SEO Optimization & it's Techniques
On page SEO Optimization & it's TechniquesOn page SEO Optimization & it's Techniques
On page SEO Optimization & it's Techniques
ย 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
ย 
Website Analysis Seo Report
Website Analysis Seo ReportWebsite Analysis Seo Report
Website Analysis Seo Report
ย 
Semantic web
Semantic webSemantic web
Semantic web
ย 
Technical SEO.pdf
Technical SEO.pdfTechnical SEO.pdf
Technical SEO.pdf
ย 
Google Core Web Vitals - Webinar
Google Core Web Vitals - WebinarGoogle Core Web Vitals - Webinar
Google Core Web Vitals - Webinar
ย 
A Practical Guide to Keyword Research
A Practical Guide to Keyword ResearchA Practical Guide to Keyword Research
A Practical Guide to Keyword Research
ย 
Technical seo tips for web developers
Technical seo tips for web developersTechnical seo tips for web developers
Technical seo tips for web developers
ย 
eCommerce SEO
eCommerce SEOeCommerce SEO
eCommerce SEO
ย 
Ajax and PHP
Ajax and PHPAjax and PHP
Ajax and PHP
ย 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?
ย 
Bootstrap
BootstrapBootstrap
Bootstrap
ย 

Similar to SEO Robots txt FILE

Article19
Article19Article19
Article19
egrowtech
ย 
Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)
panchaloha
ย 
Determining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txtDetermining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txt
nitchmarketing
ย 

Similar to SEO Robots txt FILE (20)

Article19
Article19Article19
Article19
ย 
Difference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagDifference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tag
ย 
Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)
ย 
Robots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationRobots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml Creation
ย 
The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!
ย 
Robots.txt
Robots.txtRobots.txt
Robots.txt
ย 
Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can See
ย 
Developing apps for humans & robots
Developing apps for humans & robotsDeveloping apps for humans & robots
Developing apps for humans & robots
ย 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
ย 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminar
ย 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminar
ย 
Determining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txtDetermining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txt
ย 
Your first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationYour first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementation
ย 
On-page SEO - Manish.pptx
On-page SEO - Manish.pptxOn-page SEO - Manish.pptx
On-page SEO - Manish.pptx
ย 
XML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersXML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO Beginners
ย 
How to block Website in Different Search Engines
How to block Website in Different Search EnginesHow to block Website in Different Search Engines
How to block Website in Different Search Engines
ย 
Lesson 4.pdf
Lesson 4.pdfLesson 4.pdf
Lesson 4.pdf
ย 
SEO presentation By Dang HA - ECM team
SEO presentation By Dang HA - ECM teamSEO presentation By Dang HA - ECM team
SEO presentation By Dang HA - ECM team
ย 
How developer's can help seo
How developer's can help seo How developer's can help seo
How developer's can help seo
ย 
Robots and-sitemap - Version 1.0.1
Robots and-sitemap - Version 1.0.1Robots and-sitemap - Version 1.0.1
Robots and-sitemap - Version 1.0.1
ย 

Recently uploaded

๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
nirzagarg
ย 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
SUHANI PANDEY
ย 
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
nirzagarg
ย 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
ย 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
SUHANI PANDEY
ย 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
SUHANI PANDEY
ย 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
SUHANI PANDEY
ย 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
SUHANI PANDEY
ย 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
ย 

Recently uploaded (20)

๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
ย 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
ย 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
ย 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
ย 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
ย 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
ย 
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
ย 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
ย 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
ย 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
ย 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
ย 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
ย 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
ย 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
ย 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
ย 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
ย 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
ย 
WhatsApp ๐Ÿ“ž 8448380779 โœ…Call Girls In Mamura Sector 66 ( Noida)
WhatsApp ๐Ÿ“ž 8448380779 โœ…Call Girls In Mamura Sector 66 ( Noida)WhatsApp ๐Ÿ“ž 8448380779 โœ…Call Girls In Mamura Sector 66 ( Noida)
WhatsApp ๐Ÿ“ž 8448380779 โœ…Call Girls In Mamura Sector 66 ( Noida)
ย 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
ย 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
ย 

SEO Robots txt FILE

  • 1. Robots.txt File What is robots.txt? ๏‚ท The robots.txt is a simple text file in your web site that inform search engine bots how to crawl and index website or web pages. ๏‚ท By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site. ๏‚ท Declaring a robots.txt means that visitors (bots) are not allowed to index sensitive data but it doesnโ€™t mean that they canโ€™t. The legal/good bots follow what is instructed to them but the Malware robots donโ€™t care about it, so donโ€™t try to use it as a security for your web site. How to build a robots.txt file (Terms, Structure & Placement)? The terms used in robots.txt and their meanings are given in tabular format. The robots.txt is usually placed in the root folder of your web site so that the URL of your robots.txt file resembles www.example.com/robots.txt in the web browser. Remember that you use all the lower case letter for the filename.
  • 2. Robots.txt File You can define different restrictions to different bots by applying bot specific rules but be aware that the more you make it complicated; it becomes harder for you to understand its traps. Always specify bot specific rules before specifying common rules so that bots read the file till the end to find rules specific to their names or else follow common rules. You can check our many other sites robots.txt to get a feel on how these are generally implemented. http://www.searchenabler.com/robots.txt http://www.google.com/robots.txt http://searchengineland.com/robots.txt Example scenarios for robots.txt If you have a close look at Search Enabler robots.txt, you can notice that we have blocked following pages from search indexing. You can analyze which pages and links should be blocked from your website. On a general note we advice hiding pages such as search results page within your web site and user logins, profiles, logs and styling CSS sheets. 1. Disallow: /?s= It is a dynamic search results page and there is no point in indexing it which will create duplicate content problems. 2. Disallow: /blog/2010/ These are the blogs categorized in a year wise patterns and are blocked because they lead to duplication errors with different URLs pointing to the same web page. 3. Disallow: /login/ It is a login page meant only for users of searchenabler tool so it is blocked from getting crawled. How does robots.txt affect search results? By using the robots.txt file, you can hide the pages such as user profiles and other temp folders from being indexed and does not divulge your SEO effort into junk or the pages which are useless for the search results. In general, you results will be more precise and better valued.
  • 3. Robots.txt File Default Robots.txt Default Robots.txt file basically tells every crawler that it is allowed any web site directory to its heart content: User-agent: * Disallow: (which translates as โ€œdisallow nothingโ€) The often asked question here is why to use it at all. Well, it is not required but recommended to use for the simple reason that search bots will request it anyway (this means youโ€™ll see 404 errors in your log files from bots requesting your non-existent Robots.txt page). Besides, having a default Robots.txt will ensure there wonโ€™t be any misunderstandings between your site and a crawler. Robots.txt Blocking Specific Folders / Content: The most common usage of Robots.txt is to ban crawlers from visiting private folders or content that gives them no additional information. This is done primarily in order to save the crawlerโ€™s time: bots crawl on a budget โ€“ if you ensure that it doesnโ€™t waste time on unnecessary content, it will crawl your site deeper and quicker. Samples of Robots.txt files blocking specific content (note: I highlighted only a few most basic cases): User-agent: * Disallow: /database/ (blocks all crawlers from /database/ folder ) User-agent: * Disallow: /*? (blocks all crawlers from all URLโ€™s containing ? ) User-agent: * Disallow: /navy/ Allow: /navy/about.html (blocks all crawlers from /navy/ folder but allow access to one page from this folder) Note from John Mueller commenting below: The โ€œAllow:โ€ statement is not a part of the robots.txt standard (it is however supported by many search engines, including Google)
  • 4. Robots.txt File Robots.txt Allowing Access to Specific Crawlers Some people choose to save bandwidth and allow access to only those crawlers they care about (e.g. Google, Yahoo and MSN). In this case, Robots.txt file should list those Robots followed by the command itself, etc: User-agent: * Disallow: / User-agent: googlebot Disallow: User-agent: slurp Disallow: User-agent: msnbot Disallow: (the first part blocks all crawlers from everything, while the following 3 blocks list those 3 crawlers that are allowed to access the whole site) Need Advanced Robots.txt Usage? I tend to recommend people to refrain from doing anything too tricky in their Robots.txt file unless they are 100% knowledgeable in the topic. Messed-up Robots.txt file can result in screwed project launch. Many people spend weeks and months trying to figure why there site is ignored by crawlers until they realize (often with some external help) that they have misused their Robots.txt file. The better solution for controlling crawler activity might be to get away with on-page solutions (robots meta tags). Aaron did a great job summing up the difference in his guide(bottom of the page).
  • 5. Robots.txt File Best Robots.txt Tools: Generators and Analyzers While I do not encourage anyone to rely too much on Robots.txt tools (you should either make your best to understand the syntax yourself or turn to an experienced consultant to avoid any issues), the Robots.txt generators and checkers I am listing below will hopefully be ofadditional help: Robots.txt generators: Common procedure: 1. choose default / global commands (e.g. allow/disallow all robots); 2. choose files or directories blocked for all robots; 3. choose user-agent specific commands: 1. choose action; 2. choose a specific robot to be blocked. As a general rule of thumb, I donโ€™t recommend using Robots.txt generators for the simple reason: donโ€™t create any advanced (i.e. non default) Robots.txt file until you are 100% sure you understand what you are blocking with it. But still I am listing two most trustworthy generators to check: ๏‚ท Google Webmaster tools: Robots.txt generator allows to create simple Robots.txt files. What I like most about this tool is that it automatically adds all global commands to each specific user agent commands (helping thus to avoid one of the most common mistakes): ๏‚ท SEObook Robots.txt generator unfortunately misses the above feature but it is really easy (and fun) to use:
  • 6. Robots.txt File Robots.txt checkers: ๏‚ท Google Webmaster tools: Robots.txt analyzer โ€œtranslatesโ€ what your Robots.txt dictates to the Googlebot: ๏‚ท Robots.txt Syntax Checker finds some common errors within your file by checking for whitespace separated lists, not widely supported standards, wildcard usage, etc. ๏‚ท A Validator for Robots.txt Files also checks for syntax errors and confirms correct directory paths.