SlideShare a Scribd company logo
Assisting bots in crawling the web better Hello, I’m @KahWee
To index or not to index Robots looks at robots.txt to check which are the paths allowed and disallowed for indexing User-Agent: * Disallow: /folder1/ Allow: /folder1/myfile.html Sometimes only portions of the page contain actual content
Machines want to be smarter
Where’s the actual content?
We all know it’s here
The bot thinks it’s here
How to locate the actual content? Search engines tries to make an intelligent guess where the main content is. This approach is implicit.
Specifying actual content explicitly Google uses this in HTML using a comment. <!--googleon: all-->  <div id="actual"> Loremipsum </div> <!--googleoff: all> This is proprietary.
Proposal to specify using CSS <link rel="stylesheet" media="robots" href="robots.css"> Or @media robots { * { display: none }   #actual { display: block } }

More Related Content

Similar to Assisting bots in crawling the web better

Developing Gadgets
Developing GadgetsDeveloping Gadgets
Developing Gadgets
Quirk
 
1-04: HTML Elements
1-04: HTML Elements1-04: HTML Elements
1-04: HTML Elements
apnwebdev
 
1 04-html elements
1 04-html elements1 04-html elements
1 04-html elements
apnwebdev
 
Meta tags
Meta tagsMeta tags
Meta tags
hapy
 
Meta tags
Meta tagsMeta tags
Meta tags
hapy
 
Seoptimizing
SeoptimizingSeoptimizing
Seoptimizing
Gaurav Vashisht
 
Introducing YUI
Introducing YUIIntroducing YUI
Introducing YUI
Christian Heilmann
 
Advanced SEO for Developers (Mix08)
Advanced SEO for Developers (Mix08)Advanced SEO for Developers (Mix08)
Advanced SEO for Developers (Mix08)
Nathan Buggia
 
HTML5 - techMaine Presentation 5/18/09
HTML5 - techMaine Presentation 5/18/09HTML5 - techMaine Presentation 5/18/09
HTML5 - techMaine Presentation 5/18/09
pemaquid
 
HTML5: New Possibilities for Publishing
HTML5: New Possibilities for PublishingHTML5: New Possibilities for Publishing
HTML5: New Possibilities for Publishing
iFactory
 
BasicHTML
BasicHTMLBasicHTML
BasicHTML
rcsampang
 
Html5 accessibility
Html5 accessibilityHtml5 accessibility
Html5 accessibility
Virginia DeBolt
 
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
Alfresco Software
 
BlogPaws 2010 - SEO Tips: JaneA Kelley
BlogPaws 2010 - SEO Tips: JaneA KelleyBlogPaws 2010 - SEO Tips: JaneA Kelley
BlogPaws 2010 - SEO Tips: JaneA Kelley
BlogPaws
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
patrickstox
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
Rajesh Magar
 
On-page SEO for Drupal
On-page SEO for DrupalOn-page SEO for Drupal
On-page SEO for Drupal
Svilen Sabev
 
Html5: What is it?
Html5: What is it? Html5: What is it?
Html5: What is it?
joeydehnert
 
JavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick StoxJavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick Stox
patrickstox
 
Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008
Nathan Buggia
 

Similar to Assisting bots in crawling the web better (20)

Developing Gadgets
Developing GadgetsDeveloping Gadgets
Developing Gadgets
 
1-04: HTML Elements
1-04: HTML Elements1-04: HTML Elements
1-04: HTML Elements
 
1 04-html elements
1 04-html elements1 04-html elements
1 04-html elements
 
Meta tags
Meta tagsMeta tags
Meta tags
 
Meta tags
Meta tagsMeta tags
Meta tags
 
Seoptimizing
SeoptimizingSeoptimizing
Seoptimizing
 
Introducing YUI
Introducing YUIIntroducing YUI
Introducing YUI
 
Advanced SEO for Developers (Mix08)
Advanced SEO for Developers (Mix08)Advanced SEO for Developers (Mix08)
Advanced SEO for Developers (Mix08)
 
HTML5 - techMaine Presentation 5/18/09
HTML5 - techMaine Presentation 5/18/09HTML5 - techMaine Presentation 5/18/09
HTML5 - techMaine Presentation 5/18/09
 
HTML5: New Possibilities for Publishing
HTML5: New Possibilities for PublishingHTML5: New Possibilities for Publishing
HTML5: New Possibilities for Publishing
 
BasicHTML
BasicHTMLBasicHTML
BasicHTML
 
Html5 accessibility
Html5 accessibilityHtml5 accessibility
Html5 accessibility
 
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
Enterprise Google Gadgets Integrated with Alfresco - Open Source ECM
 
BlogPaws 2010 - SEO Tips: JaneA Kelley
BlogPaws 2010 - SEO Tips: JaneA KelleyBlogPaws 2010 - SEO Tips: JaneA Kelley
BlogPaws 2010 - SEO Tips: JaneA Kelley
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
 
On-page SEO for Drupal
On-page SEO for DrupalOn-page SEO for Drupal
On-page SEO for Drupal
 
Html5: What is it?
Html5: What is it? Html5: What is it?
Html5: What is it?
 
JavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick StoxJavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick Stox
 
Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008
 

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

Assisting bots in crawling the web better

  • 1. Assisting bots in crawling the web better Hello, I’m @KahWee
  • 2. To index or not to index Robots looks at robots.txt to check which are the paths allowed and disallowed for indexing User-Agent: * Disallow: /folder1/ Allow: /folder1/myfile.html Sometimes only portions of the page contain actual content
  • 3. Machines want to be smarter
  • 5. We all know it’s here
  • 6. The bot thinks it’s here
  • 7. How to locate the actual content? Search engines tries to make an intelligent guess where the main content is. This approach is implicit.
  • 8. Specifying actual content explicitly Google uses this in HTML using a comment. <!--googleon: all-->  <div id="actual"> Loremipsum </div> <!--googleoff: all> This is proprietary.
  • 9. Proposal to specify using CSS <link rel="stylesheet" media="robots" href="robots.css"> Or @media robots { * { display: none } #actual { display: block } }
  • 10. Benefits Using CSS, we have more granular control of what to show to robots and what not to explicitly. Simple enough to enable development of better content-analysis-based applications. Potentially translate to better search, letting machines understand the open web better.