SlideShare a Scribd company logo
1 of 10
Presented By: Paridhi Infotech
http://www.paridhiinfotech.com
 Robots.txt is a text file webmasters create to instruct web robots (typically search
engine robots) how to crawl pages on their website. It is placed in root folder of
website.
Basic Samples:
 Blocking all web crawlers from all content
User-agent: *
Disallow: /
 Allowing all web crawlers access to all content
User-agent: *
Disallow:
There are a lot more commands to restrict search engine bots to restrict crawling a
particular section of website. Read here: https://www.robotstxt.org/
 The robots meta tag lets you utilize a granular, page-specific approach to controlling
how an individual page should be indexed and served to users in Google Search
results.
It is placed in <head> section of a page.
Example:
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noindex" /> (…)
</head>
<body>(…)</body>
</html>
 X-Robots-Tag is a part of an HTTP header sent from a web server designed to
control the indexing process of the overall page including specific file types.
Imagine you run a website which also has some .doc files, but you don’t want search
engines to index that filetype for a particular reason. On Apache servers, you should
add the following line to the configuration / a .htaccess file:
<FilesMatch ".doc$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>
Or, if you’d want to do this for both .doc and .pdf files:
<FilesMatch ".(doc|pdf)$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>
There are a few types of directives that tell search engine bots what pages and other
content search engine bots will be allowed to crawl and index. The most commonly
referred to are the robots.txt file and the meta robots tag.
There are two different kinds of directives:
o Crawler Directives
o Indexer Directives
I’ll briefly explain the difference below.
Robots.txt – uses the user agent, allow, disallow and sitemap directives to specify
where on site which search engine bots are allowed to crawl and not allowed to
crawl.
 Allow
 Disallow
 Meta Robots tag – allows you to specify and prevent search engines from showing
particular pages on a site in search results.
 Nofollow – allows you to specify links the should not pass on authority or PageRank
 X-Robots-tag – allows you to control how specified file types are indexed
 The X-Robots-Tag differs from the robots.txt file and meta robots tag, though, in that
the X-Robots-Tag is a part of the HTTP header that controls indexing of a page on
the whole, in addition to specific elements on a page.
 For example, if you were wanting to block a specific image or video, you could use
the HTTP response method.
 https://moz.com/learn/seo/robotstxt
 https://ahrefs.com/blog/meta-robots/
 https://www.searchenginejournal.com/everything-you-need-to-know-
about-the-x-robots-tag/
Website: http://www.paridhiinfotech.com
Email: info@paridhiinfotech.com
Paridhi Infotech

More Related Content

What's hot

Website Analysis Report : SEO, CRO Website Audit.
Website Analysis Report : SEO, CRO Website Audit.Website Analysis Report : SEO, CRO Website Audit.
Website Analysis Report : SEO, CRO Website Audit.Tarak Turki
 
Seo guide for web designers and developers
Seo guide for web designers and developersSeo guide for web designers and developers
Seo guide for web designers and developersSampath Liyanage
 
Saving Time By Testing With Jest
Saving Time By Testing With JestSaving Time By Testing With Jest
Saving Time By Testing With JestBen McCormick
 
JavaScript - Chapter 7 - Advanced Functions
 JavaScript - Chapter 7 - Advanced Functions JavaScript - Chapter 7 - Advanced Functions
JavaScript - Chapter 7 - Advanced FunctionsWebStackAcademy
 
Google Chrome DevTools features overview
Google Chrome DevTools features overviewGoogle Chrome DevTools features overview
Google Chrome DevTools features overviewOleksii Prohonnyi
 
Introduction to React native
Introduction to React nativeIntroduction to React native
Introduction to React nativeDhaval Barot
 
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...LazarinaStoyanova
 
Expert SEO & Google Algorithm Predictions For 2023
Expert SEO & Google Algorithm Predictions For 2023Expert SEO & Google Algorithm Predictions For 2023
Expert SEO & Google Algorithm Predictions For 2023Search Engine Journal
 
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdfEcommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdfHimani Kankaria
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google LighthouseHamlet Batista
 
SEO - 201: Content Optimization and Strategy
SEO - 201: Content Optimization and StrategySEO - 201: Content Optimization and Strategy
SEO - 201: Content Optimization and StrategyThree Deep Marketing
 
Client-Side Performance Testing
Client-Side Performance TestingClient-Side Performance Testing
Client-Side Performance TestingAnand Bagmar
 
Cross-Browser-Testing with Protractor & Browserstack
Cross-Browser-Testing with Protractor & BrowserstackCross-Browser-Testing with Protractor & Browserstack
Cross-Browser-Testing with Protractor & BrowserstackLeo Lindhorst
 
Google Tag Manager for beginners
Google Tag Manager for beginnersGoogle Tag Manager for beginners
Google Tag Manager for beginnersL3analytics
 
API Test Automation Using Karate (Anil Kumar Moka)
API Test Automation Using Karate (Anil Kumar Moka)API Test Automation Using Karate (Anil Kumar Moka)
API Test Automation Using Karate (Anil Kumar Moka)Peter Thomas
 

What's hot (20)

Website Analysis Report : SEO, CRO Website Audit.
Website Analysis Report : SEO, CRO Website Audit.Website Analysis Report : SEO, CRO Website Audit.
Website Analysis Report : SEO, CRO Website Audit.
 
Seo guide for web designers and developers
Seo guide for web designers and developersSeo guide for web designers and developers
Seo guide for web designers and developers
 
On-Page SEO Checklist
On-Page SEO ChecklistOn-Page SEO Checklist
On-Page SEO Checklist
 
testng
testngtestng
testng
 
Saving Time By Testing With Jest
Saving Time By Testing With JestSaving Time By Testing With Jest
Saving Time By Testing With Jest
 
JavaScript - Chapter 7 - Advanced Functions
 JavaScript - Chapter 7 - Advanced Functions JavaScript - Chapter 7 - Advanced Functions
JavaScript - Chapter 7 - Advanced Functions
 
Google Chrome DevTools features overview
Google Chrome DevTools features overviewGoogle Chrome DevTools features overview
Google Chrome DevTools features overview
 
Introduction to React native
Introduction to React nativeIntroduction to React native
Introduction to React native
 
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
 
Expert SEO & Google Algorithm Predictions For 2023
Expert SEO & Google Algorithm Predictions For 2023Expert SEO & Google Algorithm Predictions For 2023
Expert SEO & Google Algorithm Predictions For 2023
 
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdfEcommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
 
Javascript best practices
Javascript best practicesJavascript best practices
Javascript best practices
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google Lighthouse
 
SEO - 201: Content Optimization and Strategy
SEO - 201: Content Optimization and StrategySEO - 201: Content Optimization and Strategy
SEO - 201: Content Optimization and Strategy
 
On page SEO
On page SEOOn page SEO
On page SEO
 
Technical SEO.pdf
Technical SEO.pdfTechnical SEO.pdf
Technical SEO.pdf
 
Client-Side Performance Testing
Client-Side Performance TestingClient-Side Performance Testing
Client-Side Performance Testing
 
Cross-Browser-Testing with Protractor & Browserstack
Cross-Browser-Testing with Protractor & BrowserstackCross-Browser-Testing with Protractor & Browserstack
Cross-Browser-Testing with Protractor & Browserstack
 
Google Tag Manager for beginners
Google Tag Manager for beginnersGoogle Tag Manager for beginners
Google Tag Manager for beginners
 
API Test Automation Using Karate (Anil Kumar Moka)
API Test Automation Using Karate (Anil Kumar Moka)API Test Automation Using Karate (Anil Kumar Moka)
API Test Automation Using Karate (Anil Kumar Moka)
 

Similar to Controlling Crawlers with Robots.txt, Meta Tags & X-Robots-Tag

Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeLets Get Digital
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?Abhishek Mitra
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingRajesh Magar
 
Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)panchaloha
 
XML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersXML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersAditya Todawal
 
Your first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationYour first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationJérôme Verstrynge
 
The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!Premlal Dewli
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMShraddhaShinde412617
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMShraddhaShinde412617
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMShraddha327857
 

Similar to Controlling Crawlers with Robots.txt, Meta Tags & X-Robots-Tag (20)

Robots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can SeeRobots.txt - Control What Crawler Can See
Robots.txt - Control What Crawler Can See
 
SEO Robots txt FILE
SEO Robots txt FILESEO Robots txt FILE
SEO Robots txt FILE
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?
 
Article19
Article19Article19
Article19
 
On page optimization
On page optimizationOn page optimization
On page optimization
 
On page Optimization
On page OptimizationOn page Optimization
On page Optimization
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
 
Canonical and robotos (2)
Canonical and robotos (2)Canonical and robotos (2)
Canonical and robotos (2)
 
XML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO BeginnersXML Sitemap and Robots.TXT Guide for SEO Beginners
XML Sitemap and Robots.TXT Guide for SEO Beginners
 
Your first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementationYour first sitemap.xml and robots.txt implementation
Your first sitemap.xml and robots.txt implementation
 
SEO & ITS TECHNIQUES
SEO & ITS TECHNIQUESSEO & ITS TECHNIQUES
SEO & ITS TECHNIQUES
 
Robots.txt
Robots.txtRobots.txt
Robots.txt
 
The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!The role of the robots.txt file to improve site ranking!
The role of the robots.txt file to improve site ranking!
 
idleproud
idleproudidleproud
idleproud
 
idleproud
idleproudidleproud
idleproud
 
idleproud
idleproudidleproud
idleproud
 
SEO
SEOSEO
SEO
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIM
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIM
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIM
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Controlling Crawlers with Robots.txt, Meta Tags & X-Robots-Tag

  • 1. Presented By: Paridhi Infotech http://www.paridhiinfotech.com
  • 2.  Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. It is placed in root folder of website. Basic Samples:  Blocking all web crawlers from all content User-agent: * Disallow: /  Allowing all web crawlers access to all content User-agent: * Disallow: There are a lot more commands to restrict search engine bots to restrict crawling a particular section of website. Read here: https://www.robotstxt.org/
  • 3.  The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in Google Search results. It is placed in <head> section of a page. Example: <!DOCTYPE html> <html> <head> <meta name="robots" content="noindex" /> (…) </head> <body>(…)</body> </html>
  • 4.  X-Robots-Tag is a part of an HTTP header sent from a web server designed to control the indexing process of the overall page including specific file types. Imagine you run a website which also has some .doc files, but you don’t want search engines to index that filetype for a particular reason. On Apache servers, you should add the following line to the configuration / a .htaccess file: <FilesMatch ".doc$"> Header set X-Robots-Tag "noindex, noarchive, nosnippet" </FilesMatch> Or, if you’d want to do this for both .doc and .pdf files: <FilesMatch ".(doc|pdf)$"> Header set X-Robots-Tag "noindex, noarchive, nosnippet" </FilesMatch>
  • 5. There are a few types of directives that tell search engine bots what pages and other content search engine bots will be allowed to crawl and index. The most commonly referred to are the robots.txt file and the meta robots tag. There are two different kinds of directives: o Crawler Directives o Indexer Directives I’ll briefly explain the difference below.
  • 6. Robots.txt – uses the user agent, allow, disallow and sitemap directives to specify where on site which search engine bots are allowed to crawl and not allowed to crawl.  Allow  Disallow
  • 7.  Meta Robots tag – allows you to specify and prevent search engines from showing particular pages on a site in search results.  Nofollow – allows you to specify links the should not pass on authority or PageRank  X-Robots-tag – allows you to control how specified file types are indexed
  • 8.  The X-Robots-Tag differs from the robots.txt file and meta robots tag, though, in that the X-Robots-Tag is a part of the HTTP header that controls indexing of a page on the whole, in addition to specific elements on a page.  For example, if you were wanting to block a specific image or video, you could use the HTTP response method.
  • 9.  https://moz.com/learn/seo/robotstxt  https://ahrefs.com/blog/meta-robots/  https://www.searchenginejournal.com/everything-you-need-to-know- about-the-x-robots-tag/