SlideShare a Scribd company logo
1 of 14
Thursday, October 21st, 2010. Dealing with Crawlers Search Engine Optimization The Good, The Bad, & The Ugly. WELCOME CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT
2 What to Discuss? Dealing with the Crawlers the good & the bad robots ………………………….. What is robots.txt file? The Definition ………………………………... Structure of robots.txt file for SEO purpose The Syntax ………. Standard User-agent, Disallow ……………………………..…. Nonstandard Crawl-delay, Allow, Sitemap …………………. Extended Request-rate, Visit-time, Comment ………..……. Effective use of robots.txt file Best Practices ……………………… Be aware of rel=“nofollow” Comment Spammers …...……………… User Generated Spam How to avoid …………………....……………… 4 5 6 7 9 10 11 12 13 CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT BACK
3 Dealing with the Crawlers The good & the bad robots CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT BACK
4 1. What is robots.txt file? A text file placed in the root directory and it is used to communicate with the search engines regarding the sections which you don’t want them to crawl and index. ,[object Object]
 Make sure that you place the robots.txt file in the main directory.
 Restrict crawling where it's not needed.
 Common robot traps “Forms, Logins, Session IDs, Frames”.CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT BACK
5 2. Structure of Robots.txt It is easy to create a robots.txt file as the structure of the robots.txt file is simple and basically, it contains the list of user agents and the files, directories which are to be excluded from crawling and indexing. Standard ,[object Object]
 Disallow:Nonstandard ,[object Object]
 Allow:
 Sitemap:Extended Standard ,[object Object]
 Visit-time:
 Comment: CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT BACK
6 2.a Standard: { User-agent } If you would like to set value for all crawlers use: ,[object Object],If you would like to set a value to a specific search engine robot: ,[object Object],A Complete updated list of Bots can be found at:  ,[object Object],CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622.        URL: www.eSpace.com.eg X NEXT BACK

More Related Content

Similar to Dealing with Crawlers

Life Cycle And Detection Of Bot Infections Through Network Traffic Analysis
Life Cycle And Detection Of Bot Infections Through Network Traffic AnalysisLife Cycle And Detection Of Bot Infections Through Network Traffic Analysis
Life Cycle And Detection Of Bot Infections Through Network Traffic Analysis
Positive Hack Days
 
Cracking Into Embedded Devices - HACK.LU 2K8
Cracking Into Embedded Devices - HACK.LU 2K8Cracking Into Embedded Devices - HACK.LU 2K8
Cracking Into Embedded Devices - HACK.LU 2K8
guest441c58b71
 
Yarochkin, kropotov, chetvertakov tracking surreptitious malware distributi...
Yarochkin, kropotov, chetvertakov   tracking surreptitious malware distributi...Yarochkin, kropotov, chetvertakov   tracking surreptitious malware distributi...
Yarochkin, kropotov, chetvertakov tracking surreptitious malware distributi...
DefconRussia
 
Hacking Client Side Insecurities
Hacking Client Side InsecuritiesHacking Client Side Insecurities
Hacking Client Side Insecurities
amiable_indian
 
Smit WiFi_2
Smit WiFi_2Smit WiFi_2
Smit WiFi_2
mutew
 
Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016
Josep M. Pujol
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
F _
 

Similar to Dealing with Crawlers (20)

Thin
ThinThin
Thin
 
20190516 web security-basic
20190516 web security-basic20190516 web security-basic
20190516 web security-basic
 
Taming botnets
Taming botnetsTaming botnets
Taming botnets
 
Life Cycle And Detection Of Bot Infections Through Network Traffic Analysis
Life Cycle And Detection Of Bot Infections Through Network Traffic AnalysisLife Cycle And Detection Of Bot Infections Through Network Traffic Analysis
Life Cycle And Detection Of Bot Infections Through Network Traffic Analysis
 
Mobile Performance Testing Crash Course
Mobile Performance Testing Crash CourseMobile Performance Testing Crash Course
Mobile Performance Testing Crash Course
 
Cracking Into Embedded Devices - HACK.LU 2K8
Cracking Into Embedded Devices - HACK.LU 2K8Cracking Into Embedded Devices - HACK.LU 2K8
Cracking Into Embedded Devices - HACK.LU 2K8
 
Yarochkin, kropotov, chetvertakov tracking surreptitious malware distributi...
Yarochkin, kropotov, chetvertakov   tracking surreptitious malware distributi...Yarochkin, kropotov, chetvertakov   tracking surreptitious malware distributi...
Yarochkin, kropotov, chetvertakov tracking surreptitious malware distributi...
 
Threat Con 2021: What's Hitting my Honeypots
Threat Con 2021: What's Hitting my HoneypotsThreat Con 2021: What's Hitting my Honeypots
Threat Con 2021: What's Hitting my Honeypots
 
Anatomy of a browser-based botnet
Anatomy of a browser-based botnetAnatomy of a browser-based botnet
Anatomy of a browser-based botnet
 
DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)
DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)
DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)
 
Hacking Client Side Insecurities
Hacking Client Side InsecuritiesHacking Client Side Insecurities
Hacking Client Side Insecurities
 
Our way of systems monitoring in application development
Our way of systems monitoring in application developmentOur way of systems monitoring in application development
Our way of systems monitoring in application development
 
Weird new tricks for browser fingerprinting
Weird new tricks for browser fingerprintingWeird new tricks for browser fingerprinting
Weird new tricks for browser fingerprinting
 
Rest in a Nutshell 2014_05_27
Rest in a Nutshell 2014_05_27Rest in a Nutshell 2014_05_27
Rest in a Nutshell 2014_05_27
 
Operation emmental appsec
Operation emmental appsecOperation emmental appsec
Operation emmental appsec
 
Smit WiFi_2
Smit WiFi_2Smit WiFi_2
Smit WiFi_2
 
Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
 
Interacting with your app through the command line
Interacting with your app through the command lineInteracting with your app through the command line
Interacting with your app through the command line
 
Android Bluetooth Introduction
Android Bluetooth IntroductionAndroid Bluetooth Introduction
Android Bluetooth Introduction
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Dealing with Crawlers

  • 1. Thursday, October 21st, 2010. Dealing with Crawlers Search Engine Optimization The Good, The Bad, & The Ugly. WELCOME CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT
  • 2. 2 What to Discuss? Dealing with the Crawlers the good & the bad robots ………………………….. What is robots.txt file? The Definition ………………………………... Structure of robots.txt file for SEO purpose The Syntax ………. Standard User-agent, Disallow ……………………………..…. Nonstandard Crawl-delay, Allow, Sitemap …………………. Extended Request-rate, Visit-time, Comment ………..……. Effective use of robots.txt file Best Practices ……………………… Be aware of rel=“nofollow” Comment Spammers …...……………… User Generated Spam How to avoid …………………....……………… 4 5 6 7 9 10 11 12 13 CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 3. 3 Dealing with the Crawlers The good & the bad robots CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 4.
  • 5. Make sure that you place the robots.txt file in the main directory.
  • 6. Restrict crawling where it's not needed.
  • 7. Common robot traps “Forms, Logins, Session IDs, Frames”.CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 8.
  • 9.
  • 11.
  • 13. Comment: CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 22. Disallow: /users/login/CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 23.
  • 24.
  • 25.
  • 26. Sitemap: http://www.domain.com/dir/s/names-sitemap.xml.gzCONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 27.
  • 29.
  • 31.
  • 32. Comment: because Yahoo sucks :PCONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 33.
  • 34. Use more secure methods for sensitive content.
  • 35. Avoid allowing search result-like pages to be crawled.
  • 36. Avoid allowing URLs created as a result of proxy services to be crawled.
  • 37. Create separate robots.txt file for each subdomain.CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 38.
  • 39.
  • 40. Turn on comment moderation.
  • 42. Disallow hyperlink in comments.
  • 43. Block comments pages using robots.txt or META tags.
  • 44. Think twice before enabling guestbook or comments.
  • 45. Use blacklist to prevent repetitive spamming attempts.
  • 46. Add a “report spam” feature to user profiles and friend invitations.
  • 47. Monitor your site for spammy pages.CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 48.
  • 49. On Page Factors.
  • 50.
  • 51.
  • 52. Make use of free webmasters toolsCONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X NEXT BACK
  • 53. THANK YOU CONTACT: 2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt. Phone: +2 (03) 546-7622. URL: www.eSpace.com.eg X BACK