Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
Next
Download to read offline and view in fullscreen.

Share

Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

Download to read offline

Methods of blocking/restricting Google's crawl of your site are so often confused or incorrectly selected - in this presentation Chris outlines a deceptively simple method for picking the right tool for the job.

Related Books

Free with a 30 day trial from Scribd

See all

Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

  1. 1. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co StrategiQ Chris Green @chrisgreen87 http://bit.ly/snog-marry-avoid Robots: Txt, Meta & X The Snog, Marry & Avoid of the Webcrawling World
  2. 2. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 How do we know the best way to manage Googlebot’s crawl/indexing?
  3. 3. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are many methods (we’re spoilt for choice really)
  4. 4. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But the two most commonly misused are Robots.txt vs Meta Robots directives
  5. 5. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Why?
  6. 6. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co To the casual observer they’re very similar ways of doing the same thing…
  7. 7. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 To block Google
  8. 8. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But that’s not a helpful way of thinking of them
  9. 9. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 In many it can stop you getting the most out of your site circumstances
  10. 10. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to run through a framework to help to change this thinking
  11. 11. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to run through a framework to help to you make the right choices
  12. 12. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But some words of warning
  13. 13. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co This is advanced stuff One foot wrong & you could cause serious damage
  14. 14. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co They’re not always the first-choice These are only part of your toolkit
  15. 15. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are so many “ifs” & “buts”
  16. 16. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 If we can finish today with slightly more understanding
  17. 17. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co And a different approach then we’re onto a winner!
  18. 18. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  19. 19. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Time to introduce the robots
  20. 20. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt
  21. 21. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots
  22. 22. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co X-Robots
  23. 23. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Possibly the most important SEO tools
  24. 24. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But which do you...
  25. 25. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Snog
  26. 26. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Marry
  27. 27. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Or avoid?
  28. 28. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  29. 29. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But what does a slightly s**t BBC 3 show have to do with SEO?
  30. 30. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is
  31. 31. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is another’s “marry”
  32. 32. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is another’s “marry” Or perhaps even “avoid”
  33. 33. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are lots of thoughts on how to use these
  34. 34. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are lots of thoughts on how to use these - many are wrong
  35. 35. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to show you a way of simplifying things
  36. 36. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co To pick the right tool for the job
  37. 37. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Know the problem you’re trying to fix!
  38. 38. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Is it a crawl problem? Google isn’t seeing enough of your site
  39. 39. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Or an index problem? Google’s indexing too much of it
  40. 40. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co We fix crawl problems with Robots.txt
  41. 41. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*mad-spider-trap*
  42. 42. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co And we fix index problems with Meta Robots
  43. 43. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/crap-page <meta name=“robots” content=“NOINDEX, FOLLOW”>
  44. 44. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Identifying the problem
  45. 45. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Index problems are simple to spot
  46. 46. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  47. 47. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  48. 48. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  49. 49. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  50. 50. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Does your site look too big? (in Google’s eyes)
  51. 51. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But ID’ing a crawl problem... Can be trickier
  52. 52. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  53. 53. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look for spider traps https://www.portent.com/blog/seo/field-guide-to- spider-traps-an-seo-companion.htm
  54. 54. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Where does this “cost” you on crawl budget?
  55. 55. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co A word on crawl budget.
  56. 56. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It’s “a thing” http://searchengineland.com/google-explain s-crawl-budget-means-webmasters-267597
  57. 57. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But Google doesn’t publicise a site’s crawl budget
  58. 58. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 You can work out a version of it yourself Thanks to Yoast for this - https://yoast.com/crawl-budget-optimization/
  59. 59. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look at GSC Crawl stats average pages crawled per day
  60. 60. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look at GSC, how big Google sees your site as
  61. 61. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Pages / Avg crawled per day = Crawl Score
  62. 62. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 9,781 / 1,458 = 6.7 x 6.7 more pages than are getting crawled each day
  63. 63. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co If you have 10 You have 10x the pages that Google is crawling daily
  64. 64. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co A pretty big crawl problem!
  65. 65. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But, how big is big?
  66. 66. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 < 1,000 pages, crawl budget is less of a problem https://webmasters.googleblog.com/2017/01/what-crawl-budget-mean s-for-googlebot.html?m=1
  67. 67. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 1,000 - 10,000 is moderate
  68. 68. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 10,000+ pages… things start to get “fun”
  69. 69. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Crawl & Index problems aren’t mutually exclusive
  70. 70. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Index bloat at scale can hurt crawl
  71. 71. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Crawl issues can stop or slow the repair of index issues
  72. 72. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Some example scenarios
  73. 73. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 eCommerce filters which are getting indexed (badly)
  74. 74. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  75. 75. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/shop/mens/trainers/size-12 /red/ <meta name=“robots” content=“NOINDEX, FOLLOW”>
  76. 76. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*size* Disallow: /*red*
  77. 77. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*size* Disallow: /*red*
  78. 78. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Not until index issue is cleared up*
  79. 79. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 *Unless...
  80. 80. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Noindex: /*size* Disallow: /*size* Noindex: /*red* Disallow: /*red*
  81. 81. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Google isn’t “cool” with this https://www.seroundtable.com/google-do-not-use -noindex-in-robots-txt-20873.html
  82. 82. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  83. 83. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But it’s proved to work https://www.deepcrawl.com/blog/best-practice/robots -txt-noindex-the-best-kept-secret-in-seo/ http://ohgm.co.uk/de-index-pages-blocked-robots-txt/
  84. 84. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Only ~0.3% of the Majestic Million use this method
  85. 85. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Don’t be too aggressive though!
  86. 86. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Be aware that some filtered pages can be worth indexing
  87. 87. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Blog taxonomies
  88. 88. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  89. 89. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/blog/blog-category/ www.domain.com/blog/tags-bloody-tags/ <meta name=“robots” content=“NOINDEX, FOLLOW”>
  90. 90. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 eCommerce site without indexed filters but x6+ crawl score
  91. 91. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*filters* (& meta robots just in case)
  92. 92. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Other misc pages? Just noindex
  93. 93. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Anything else?
  94. 94. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  95. 95. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Back to my original premise
  96. 96. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots is my “marry”
  97. 97. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  98. 98. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt is my “snog”
  99. 99. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  100. 100. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It really can make the difference
  101. 101. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta robots replaced with robots.txt disallow:
  102. 102. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But it is easy to screw it up
  103. 103. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co What about x-robots?
  104. 104. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co For when meta robots isn’t possible…
  105. 105. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co For when meta robots isn’t possible… … assuming you can edit htaccess
  106. 106. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It’s not my “avoid” though
  107. 107. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 What is?
  108. 108. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co The lazy option!
  109. 109. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*pointless* Disallow: /*disallow-rules* Disallow: /*instead-of* Disallow: /*fixing-the-problem.html
  110. 110. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/200000-filtered-combos <meta name=“robots” content=“NOINDEX, NOFOLLOW”>
  111. 111. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /we-should-write-better-content/ #but don’t want to prioritise
  112. 112. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co The “best choice” depends on your limitations
  113. 113. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Do you have all the access you need?
  114. 114. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Or enough buy-in?
  115. 115. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Otherwise, some workarounds are better than doing nothing
  116. 116. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots via GTM https://moz.com/blog/seo-changes-using-google-tag-manager
  117. 117. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt when no other option User-agent: * Disallow: /better-than-doing-nothing/
  118. 118. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Key takeaways
  119. 119. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Is it a crawl or index problem?
  120. 120. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Check what you can change
  121. 121. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Check what you can change And what you can’t…
  122. 122. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Make the “best case” fix
  123. 123. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Implement, crawl & check again!
  124. 124. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Use this flowchart to help http://bit.ly/bseo-flow
  125. 125. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Thank you. http://bit.ly/snog-marry-avoid @chrisgreen87
  • ChrisDrury6

    May. 5, 2018
  • sergiobellon

    Jan. 15, 2018
  • mgb9

    Sep. 18, 2017
  • jesusam

    Sep. 16, 2017

Methods of blocking/restricting Google's crawl of your site are so often confused or incorrectly selected - in this presentation Chris outlines a deceptively simple method for picking the right tool for the job.

Views

Total views

6,806

On Slideshare

0

From embeds

0

Number of embeds

3,778

Actions

Downloads

31

Shares

0

Comments

0

Likes

4

×