SlideShare a Scribd company logo
Robots.txt
Introduction
• The robots exclusion protocol (REP), or robots.txt is a standard used
by websites to communicate with web crawlers and other web
robots.
• It is a text file webmasters create to instruct search engine robots
how to crawl and index pages on their website.
History
• The standard was proposed by Martijn Koster, when working for Nexor in February,
1994 on the www-talk mailing list, the main communication channel for WWW-related
activities at the time.
• Charles Stross claims to have provoked Koster to suggest robots.txt, after he wrote a
badly-behaved web crawler that caused an inadvertent denial of service attack on
Koster's server.
• The /robots.txt is a de-facto standard, and is not owned by any
standards body. There are two historical descriptions:
• the original 1994 A Standard for Robot Exclusion document.
• a 1997 Internet Draft specification A Method for Web Robots Control
Examples
• Block all web crawlers from all content
User-agent: *
Disallow: /
• Block a specific web crawler from a specific folder
User-agent: Googlebot
Disallow: /no-google/
• Block a specific web crawler from a specific web page
User-agent: Googlebot
Disallow: /no-google/blocked-page.html
The "User-agent: *"
means this section
applies to all robots. The
"Disallow: /" tells the
robot that it should not
visit any pages on the
site.
* - which is a wildcard
that represents any
sequence of
• To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
• Sitemap Parameter
User-agent: *
Disallow:
Sitemap: http://www.example.com/none-standard-location/sitemap.xml
• Crawl-delay directive
Several major crawlers support a Crawl-delay parameter, set to the number of seconds to wait
between successive requests to the same server
User-agent: *
Crawl-delay:
• Allow directive
If one wants to allow single files inside an otherwise disallowed directory, it is necessary to place
the Allow directive(s) first, followed by the Disallow.
Allow: /directory1/myfile.html
Disallow: /directory1/
• Host
Some crawlers (Yandex, Google) support a Host directive, allowing websites with multiple mirrors
to specify their preferred domain.[26]
Host: www.example.com
Important Rules
• In most cases, meta robots with parameters "noindex, follow" should be employed as a
way to restrict crawling or indexation.
• It is important to note that malicious crawlers are likely to completely ignore robots.txt
and as such, this protocol does not make a good security mechanism.
• Only one "Disallow:" line is allowed for each URL.
• Each subdomain on a root domain uses separate robots.txt files.
• The filename of robots.txt is case sensitive. Use "robots.txt", not "Robots.TXT.“
• Spacing is not an accepted way to separate query parameters. For example,
"/category/ /product page" would not be honored by robots.txt.
Thank You

More Related Content

What's hot

Semantic web
Semantic webSemantic web
Semantic web
Myungjin Lee
 
Keyword Research Process
Keyword Research ProcessKeyword Research Process
Keyword Research Process
Rakesh Kumar
 
Meta tags
Meta tagsMeta tags
Meta tags
hapy
 
Basic SEO Presentation
Basic SEO PresentationBasic SEO Presentation
Basic SEO Presentation
Paul Kortman
 
Basics of Web Development.pptx
Basics of Web Development.pptxBasics of Web Development.pptx
Basics of Web Development.pptx
Palash Sukla Das
 
Html, CSS & Web Designing
Html, CSS & Web DesigningHtml, CSS & Web Designing
Html, CSS & Web Designing
Leslie Steele
 
HTML CSS JS in Nut shell
HTML  CSS JS in Nut shellHTML  CSS JS in Nut shell
HTML CSS JS in Nut shell
Ashwin Shiv
 
Google Search Console: An Ultimate Guide
Google Search Console: An Ultimate GuideGoogle Search Console: An Ultimate Guide
Google Search Console: An Ultimate Guide
Tyler Horvath
 
On page SEO Optimization & it's Techniques
On page SEO Optimization & it's TechniquesOn page SEO Optimization & it's Techniques
On page SEO Optimization & it's Techniques
Pratibha Maurya
 
Complete Lecture on Css presentation
Complete Lecture on Css presentation Complete Lecture on Css presentation
Complete Lecture on Css presentation
Salman Memon
 
HTML and Responsive Design
HTML and Responsive Design HTML and Responsive Design
HTML and Responsive Design
Mindy McAdams
 
Semantic web
Semantic web Semantic web
Semantic web
Pallavi Srivastava
 
Introduction to web development
Introduction to web developmentIntroduction to web development
Introduction to web development
Mohammed Safwat
 
basic Seo ppt
basic Seo pptbasic Seo ppt
basic Seo ppt
jaswinder01
 
Web Development using HTML & CSS
Web Development using HTML & CSSWeb Development using HTML & CSS
Web Development using HTML & CSS
Brainware Consultancy Pvt Ltd
 
Search engine
Search engineSearch engine
Search engine
silambu111
 
Bootstrap 5 basic
Bootstrap 5 basicBootstrap 5 basic
Bootstrap 5 basic
Jubair Ahmed Junjun
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
Primya Tamil
 
SEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) ReportSEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) Report
Kevin James
 
SEO - a brief introduction
SEO - a brief introductionSEO - a brief introduction
SEO - a brief introduction
Becky McOwen-Banks
 

What's hot (20)

Semantic web
Semantic webSemantic web
Semantic web
 
Keyword Research Process
Keyword Research ProcessKeyword Research Process
Keyword Research Process
 
Meta tags
Meta tagsMeta tags
Meta tags
 
Basic SEO Presentation
Basic SEO PresentationBasic SEO Presentation
Basic SEO Presentation
 
Basics of Web Development.pptx
Basics of Web Development.pptxBasics of Web Development.pptx
Basics of Web Development.pptx
 
Html, CSS & Web Designing
Html, CSS & Web DesigningHtml, CSS & Web Designing
Html, CSS & Web Designing
 
HTML CSS JS in Nut shell
HTML  CSS JS in Nut shellHTML  CSS JS in Nut shell
HTML CSS JS in Nut shell
 
Google Search Console: An Ultimate Guide
Google Search Console: An Ultimate GuideGoogle Search Console: An Ultimate Guide
Google Search Console: An Ultimate Guide
 
On page SEO Optimization & it's Techniques
On page SEO Optimization & it's TechniquesOn page SEO Optimization & it's Techniques
On page SEO Optimization & it's Techniques
 
Complete Lecture on Css presentation
Complete Lecture on Css presentation Complete Lecture on Css presentation
Complete Lecture on Css presentation
 
HTML and Responsive Design
HTML and Responsive Design HTML and Responsive Design
HTML and Responsive Design
 
Semantic web
Semantic web Semantic web
Semantic web
 
Introduction to web development
Introduction to web developmentIntroduction to web development
Introduction to web development
 
basic Seo ppt
basic Seo pptbasic Seo ppt
basic Seo ppt
 
Web Development using HTML & CSS
Web Development using HTML & CSSWeb Development using HTML & CSS
Web Development using HTML & CSS
 
Search engine
Search engineSearch engine
Search engine
 
Bootstrap 5 basic
Bootstrap 5 basicBootstrap 5 basic
Bootstrap 5 basic
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
SEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) ReportSEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) Report
 
SEO - a brief introduction
SEO - a brief introductionSEO - a brief introduction
SEO - a brief introduction
 

Similar to Robots.txt

Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can See
Lets Get Digital
 
Burp suite
Burp suiteBurp suite
Burp suite
Yashar Shahinzadeh
 
Difference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagDifference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tag
Paridhi Infotech
 
webcrawler.pptx
webcrawler.pptxwebcrawler.pptx
webcrawler.pptx
NiteshKumar176268
 
Robots.txt
Robots.txtRobots.txt
05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver
tarensi
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
Rajesh Magar
 
Robots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationRobots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml Creation
Jahid Hasan
 
Webcrawler
Webcrawler Webcrawler
Webcrawler
Govind Raj
 
Introduction to "robots.txt
Introduction to "robots.txtIntroduction to "robots.txt
Introduction to "robots.txt
Ishan Mishra
 
Web crawler
Web crawlerWeb crawler
Web crawler
Abhishek Gupta
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
vinay arora
 
Web crawler
Web crawlerWeb crawler
Web crawler
poonamkenkre
 
CNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application TechnologiesCNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application Technologies
Sam Bowne
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
mynameismrslide
 
Open Source Libraries for.NET developers
Open Source Libraries for.NET developersOpen Source Libraries for.NET developers
Open Source Libraries for.NET developers
Andrei Marukovich
 
Web Crawlers
Web CrawlersWeb Crawlers
Web Crawlers
Suhasini S Kulkarni
 
DNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN InstancesDNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN Instances
Will Strohl
 
Training Webinar: Enterprise application performance with server push technol...
Training Webinar: Enterprise application performance with server push technol...Training Webinar: Enterprise application performance with server push technol...
Training Webinar: Enterprise application performance with server push technol...
OutSystems
 
Ruby on-rails-security
Ruby on-rails-securityRuby on-rails-security
Ruby on-rails-security
Phong Nguyễn Đình
 

Similar to Robots.txt (20)

Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can See
 
Burp suite
Burp suiteBurp suite
Burp suite
 
Difference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tagDifference between robots txt file, meta robots, X-robots tag
Difference between robots txt file, meta robots, X-robots tag
 
webcrawler.pptx
webcrawler.pptxwebcrawler.pptx
webcrawler.pptx
 
Robots.txt
Robots.txtRobots.txt
Robots.txt
 
05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
 
Robots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml CreationRobots.txt and Sitemap.xml Creation
Robots.txt and Sitemap.xml Creation
 
Webcrawler
Webcrawler Webcrawler
Webcrawler
 
Introduction to "robots.txt
Introduction to "robots.txtIntroduction to "robots.txt
Introduction to "robots.txt
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
CNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application TechnologiesCNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application Technologies
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Open Source Libraries for.NET developers
Open Source Libraries for.NET developersOpen Source Libraries for.NET developers
Open Source Libraries for.NET developers
 
Web Crawlers
Web CrawlersWeb Crawlers
Web Crawlers
 
DNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN InstancesDNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN Instances
 
Training Webinar: Enterprise application performance with server push technol...
Training Webinar: Enterprise application performance with server push technol...Training Webinar: Enterprise application performance with server push technol...
Training Webinar: Enterprise application performance with server push technol...
 
Ruby on-rails-security
Ruby on-rails-securityRuby on-rails-security
Ruby on-rails-security
 

Recently uploaded

Story Telling Master Class - Jennifer Morilla
Story Telling Master Class - Jennifer MorillaStory Telling Master Class - Jennifer Morilla
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
SemajahParker
 
How American Bath Group Leveraged Kontent
How American Bath Group Leveraged KontentHow American Bath Group Leveraged Kontent
Grow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital MarketingGrow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital Marketing
Digital Discovery Institute
 
Top digital marketing institutein noida
Top digital marketing institutein noidaTop digital marketing institutein noida
Top digital marketing institutein noida
aditisingh6607
 
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptxFrom Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
Boston SEO Services
 
PickUp_conversational AI_Capex, Inc._20240610
PickUp_conversational AI_Capex, Inc._20240610PickUp_conversational AI_Capex, Inc._20240610
PickUp_conversational AI_Capex, Inc._20240610
Shuntaro Kogame
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
Shuntaro Kogame
 
Gokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| CoimbatoreGokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| Coimbatore
dmgokila
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Mastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis YuMastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis YuMastering SEO for Google in the AI Era - Dennis Yu
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdfLuxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
KiranRai75
 
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
Veronika Höller
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Efficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing ProsEfficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing Pros
Lauren Polinsky
 
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
Amsive
 

Recently uploaded (20)

Story Telling Master Class - Jennifer Morilla
Story Telling Master Class - Jennifer MorillaStory Telling Master Class - Jennifer Morilla
Story Telling Master Class - Jennifer Morilla
 
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
 
How American Bath Group Leveraged Kontent
How American Bath Group Leveraged KontentHow American Bath Group Leveraged Kontent
How American Bath Group Leveraged Kontent
 
Grow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital MarketingGrow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital Marketing
 
Top digital marketing institutein noida
Top digital marketing institutein noidaTop digital marketing institutein noida
Top digital marketing institutein noida
 
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptxFrom Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
 
PickUp_conversational AI_Capex, Inc._20240610
PickUp_conversational AI_Capex, Inc._20240610PickUp_conversational AI_Capex, Inc._20240610
PickUp_conversational AI_Capex, Inc._20240610
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
 
Gokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| CoimbatoreGokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| Coimbatore
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
 
Mastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis YuMastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis Yu
 
Mastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis YuMastering SEO for Google in the AI Era - Dennis Yu
Mastering SEO for Google in the AI Era - Dennis Yu
 
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
 
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdfLuxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
 
Amazing and On Point - Ramon Ray, USA TODAY
Amazing and On Point - Ramon Ray, USA TODAYAmazing and On Point - Ramon Ray, USA TODAY
Amazing and On Point - Ramon Ray, USA TODAY
 
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
 
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
WTS-Berlin-2024-Veronika-Höller-Innovate-NextGEN-SEO-Merging-AI-Multimedia-an...
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan Brock
 
Efficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing ProsEfficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing Pros
 
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
Lily Ray - Optimize the Forest, Not the Trees: Move Beyond SEO Checklist - Mo...
 

Robots.txt

  • 2. Introduction • The robots exclusion protocol (REP), or robots.txt is a standard used by websites to communicate with web crawlers and other web robots. • It is a text file webmasters create to instruct search engine robots how to crawl and index pages on their website.
  • 3. History • The standard was proposed by Martijn Koster, when working for Nexor in February, 1994 on the www-talk mailing list, the main communication channel for WWW-related activities at the time. • Charles Stross claims to have provoked Koster to suggest robots.txt, after he wrote a badly-behaved web crawler that caused an inadvertent denial of service attack on Koster's server. • The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions: • the original 1994 A Standard for Robot Exclusion document. • a 1997 Internet Draft specification A Method for Web Robots Control
  • 4. Examples • Block all web crawlers from all content User-agent: * Disallow: / • Block a specific web crawler from a specific folder User-agent: Googlebot Disallow: /no-google/ • Block a specific web crawler from a specific web page User-agent: Googlebot Disallow: /no-google/blocked-page.html The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site. * - which is a wildcard that represents any sequence of
  • 5. • To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/ • Sitemap Parameter User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml • Crawl-delay directive Several major crawlers support a Crawl-delay parameter, set to the number of seconds to wait between successive requests to the same server User-agent: * Crawl-delay:
  • 6. • Allow directive If one wants to allow single files inside an otherwise disallowed directory, it is necessary to place the Allow directive(s) first, followed by the Disallow. Allow: /directory1/myfile.html Disallow: /directory1/ • Host Some crawlers (Yandex, Google) support a Host directive, allowing websites with multiple mirrors to specify their preferred domain.[26] Host: www.example.com
  • 7. Important Rules • In most cases, meta robots with parameters "noindex, follow" should be employed as a way to restrict crawling or indexation. • It is important to note that malicious crawlers are likely to completely ignore robots.txt and as such, this protocol does not make a good security mechanism. • Only one "Disallow:" line is allowed for each URL. • Each subdomain on a root domain uses separate robots.txt files. • The filename of robots.txt is case sensitive. Use "robots.txt", not "Robots.TXT.“ • Spacing is not an accepted way to separate query parameters. For example, "/category/ /product page" would not be honored by robots.txt.