SlideShare a Scribd company logo
1 of 3
What is Robots.txt<br />Robots.txt<br />It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content pealty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.<br />One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.<br />What Is Robots.txt?<br />Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.<br />The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.<br />The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.<br />Structure of a Robots.txt File<br />The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:<br />User-agent:<br />Disallow:<br />“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:<br /># All user agents are disallowed to see the /temp directory.<br />User-agent: *<br />Disallow: /temp/<br />The Traps of a Robots.txt File<br />When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.<br />The more serious problem is with logical errors. For instance:<br />User-agent: *<br />Disallow: /temp/<br />User-agent: Googlebot<br />Disallow: /images/<br />Disallow: /temp/<br />Disallow: /cgi-bin/<br />The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.<br />Tools to Generate and Validate a Robots.txt File<br />Having in mind the simple syntax of a robots.txt file, you can always read it to see if everything is OK but it is much easier to use a validator, like this one: http://tool.motoricerca.info/robots-checker.phtml. These tools report about common mistakes like missing slashes or colons, which if not detected compromise your efforts. For instance, if you have typed:<br />User agent: *<br />Disallow: /temp/<br />this is wrong because there is no slash between “user” and “agent” and the syntax is incorrect.<br />In those cases, when you have a complex robots.txt file – i.e. you give different instructions to different user agents or you have a long list of directories and subdirectories to exclude, writing the file manually can be a real pain. But do not worry – there are tools that will generate the file for you. What is more, there are visual tools that allow to point and select which files and folders are to be excluded. But even if you do not feel like buying a graphical tool for robots.txt generation, there are online tools to assist you. For instance, the Server-Side Robots Generator offers a dropdown list of user agents and a text box for you to list the files you don't want indexed. Honestly, it is not much of a help, unless you want to set specific rules for different search engines because in any case it is up to you to type the list of directories but is more than nothing.<br />For more details,kindly visit http://www.mumbaiseo.co.cc/What-is-Robotstxt.html<br />For more queries you may kindly contact us at 9619240381 or email us at egrowtech@gmail.com<br />
Article19
Article19

More Related Content

Viewers also liked

Texas STaR Chart
Texas STaR ChartTexas STaR Chart
Texas STaR Chartahdz21
 
Olanrewaju Bolaji Cv2
Olanrewaju Bolaji Cv2Olanrewaju Bolaji Cv2
Olanrewaju Bolaji Cv2ebunwumi2
 
Management services contrac ts
Management services contrac tsManagement services contrac ts
Management services contrac tsLaura Hernandez
 
Technology action plan
Technology action planTechnology action plan
Technology action plancpeterson1200
 
Change Is You
Change Is YouChange Is You
Change Is Youebunwumi2
 

Viewers also liked (7)

Texas STaR Chart
Texas STaR ChartTexas STaR Chart
Texas STaR Chart
 
Olanrewaju Bolaji Cv2
Olanrewaju Bolaji Cv2Olanrewaju Bolaji Cv2
Olanrewaju Bolaji Cv2
 
Article16
Article16Article16
Article16
 
Presentation1
Presentation1Presentation1
Presentation1
 
Management services contrac ts
Management services contrac tsManagement services contrac ts
Management services contrac ts
 
Technology action plan
Technology action planTechnology action plan
Technology action plan
 
Change Is You
Change Is YouChange Is You
Change Is You
 

Similar to Article19

Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)panchaloha
 
The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!Premlal Dewli
 
Difference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagDifference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagParidhi Infotech
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?Abhishek Mitra
 
Developing apps for humans & robots
Developing apps for humans & robotsDeveloping apps for humans & robots
Developing apps for humans & robotsNagaraju Sangam
 
Robots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationRobots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationJahid Hasan
 
Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeLets Get Digital
 
How developer's can help seo
How developer's can help seo How developer's can help seo
How developer's can help seo Gunjan Srivastava
 
Security concerns in microsoft share point 2013
Security concerns in microsoft share point 2013Security concerns in microsoft share point 2013
Security concerns in microsoft share point 2013Ramasubramanian Thumati
 
Determining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txtDetermining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txtnitchmarketing
 
On-page SEO - Manish.pptx
On-page SEO - Manish.pptxOn-page SEO - Manish.pptx
On-page SEO - Manish.pptxoutofboxmra
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingRajesh Magar
 
Your first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationYour first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationJérôme Verstrynge
 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarcooljeba
 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarcooljeba
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine SpidersCJ Jenkins
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine OptimizationPraveen P
 
XML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersXML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersAditya Todawal
 
Introduction to HTML
Introduction to HTMLIntroduction to HTML
Introduction to HTMLMayaLisa
 

Similar to Article19 (20)

Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)
 
The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!
 
Difference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagDifference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tag
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?
 
Developing apps for humans & robots
Developing apps for humans & robotsDeveloping apps for humans & robots
Developing apps for humans & robots
 
Robots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationRobots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml Creation
 
Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can See
 
Robots.txt
Robots.txtRobots.txt
Robots.txt
 
How developer's can help seo
How developer's can help seo How developer's can help seo
How developer's can help seo
 
Security concerns in microsoft share point 2013
Security concerns in microsoft share point 2013Security concerns in microsoft share point 2013
Security concerns in microsoft share point 2013
 
Determining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txtDetermining Bias to Search Engines from Robots.txt
Determining Bias to Search Engines from Robots.txt
 
On-page SEO - Manish.pptx
On-page SEO - Manish.pptxOn-page SEO - Manish.pptx
On-page SEO - Manish.pptx
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
 
Your first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationYour first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementation
 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminar
 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminar
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
XML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersXML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO Beginners
 
Introduction to HTML
Introduction to HTMLIntroduction to HTML
Introduction to HTML
 

More from egrowtech

More from egrowtech (17)

Seo training in navi mumbai
Seo training in navi mumbaiSeo training in navi mumbai
Seo training in navi mumbai
 
Top 10 SEO Mistakes
Top 10 SEO MistakesTop 10 SEO Mistakes
Top 10 SEO Mistakes
 
Article20
Article20Article20
Article20
 
Article17
Article17Article17
Article17
 
Article14
Article14Article14
Article14
 
Article12
Article12Article12
Article12
 
Article10
Article10Article10
Article10
 
Article9
Article9Article9
Article9
 
Article8
Article8Article8
Article8
 
Article7
Article7Article7
Article7
 
Article6
Article6Article6
Article6
 
Article5
Article5Article5
Article5
 
Article4
Article4Article4
Article4
 
Article3
Article3Article3
Article3
 
Article2
Article2Article2
Article2
 
Article1
Article1Article1
Article1
 
Article1
Article1Article1
Article1
 

Article19

  • 1. What is Robots.txt<br />Robots.txt<br />It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content pealty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.<br />One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.<br />What Is Robots.txt?<br />Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.<br />The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.<br />The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.<br />Structure of a Robots.txt File<br />The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:<br />User-agent:<br />Disallow:<br />“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:<br /># All user agents are disallowed to see the /temp directory.<br />User-agent: *<br />Disallow: /temp/<br />The Traps of a Robots.txt File<br />When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.<br />The more serious problem is with logical errors. For instance:<br />User-agent: *<br />Disallow: /temp/<br />User-agent: Googlebot<br />Disallow: /images/<br />Disallow: /temp/<br />Disallow: /cgi-bin/<br />The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.<br />Tools to Generate and Validate a Robots.txt File<br />Having in mind the simple syntax of a robots.txt file, you can always read it to see if everything is OK but it is much easier to use a validator, like this one: http://tool.motoricerca.info/robots-checker.phtml. These tools report about common mistakes like missing slashes or colons, which if not detected compromise your efforts. For instance, if you have typed:<br />User agent: *<br />Disallow: /temp/<br />this is wrong because there is no slash between “user” and “agent” and the syntax is incorrect.<br />In those cases, when you have a complex robots.txt file – i.e. you give different instructions to different user agents or you have a long list of directories and subdirectories to exclude, writing the file manually can be a real pain. But do not worry – there are tools that will generate the file for you. What is more, there are visual tools that allow to point and select which files and folders are to be excluded. But even if you do not feel like buying a graphical tool for robots.txt generation, there are online tools to assist you. For instance, the Server-Side Robots Generator offers a dropdown list of user agents and a text box for you to list the files you don't want indexed. Honestly, it is not much of a help, unless you want to set specific rules for different search engines because in any case it is up to you to type the list of directories but is more than nothing.<br />For more details,kindly visit http://www.mumbaiseo.co.cc/What-is-Robotstxt.html<br />For more queries you may kindly contact us at 9619240381 or email us at egrowtech@gmail.com<br />