Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

4,801 views

Published on

Methods of blocking/restricting Google's crawl of your site are so often confused or incorrectly selected - in this presentation Chris outlines a deceptively simple method for picking the right tool for the job.

Published in: Marketing
  • Be the first to comment

Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

  1. 1. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co StrategiQ Chris Green @chrisgreen87 http://bit.ly/snog-marry-avoid Robots: Txt, Meta & X The Snog, Marry & Avoid of the Webcrawling World
  2. 2. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 How do we know the best way to manage Googlebot’s crawl/indexing?
  3. 3. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are many methods (we’re spoilt for choice really)
  4. 4. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But the two most commonly misused are Robots.txt vs Meta Robots directives
  5. 5. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Why?
  6. 6. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co To the casual observer they’re very similar ways of doing the same thing…
  7. 7. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 To block Google
  8. 8. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But that’s not a helpful way of thinking of them
  9. 9. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 In many it can stop you getting the most out of your site circumstances
  10. 10. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to run through a framework to help to change this thinking
  11. 11. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to run through a framework to help to you make the right choices
  12. 12. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But some words of warning
  13. 13. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co This is advanced stuff One foot wrong & you could cause serious damage
  14. 14. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co They’re not always the first-choice These are only part of your toolkit
  15. 15. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are so many “ifs” & “buts”
  16. 16. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 If we can finish today with slightly more understanding
  17. 17. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co And a different approach then we’re onto a winner!
  18. 18. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  19. 19. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Time to introduce the robots
  20. 20. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt
  21. 21. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots
  22. 22. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co X-Robots
  23. 23. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Possibly the most important SEO tools
  24. 24. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But which do you...
  25. 25. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Snog
  26. 26. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Marry
  27. 27. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Or avoid?
  28. 28. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  29. 29. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But what does a slightly s**t BBC 3 show have to do with SEO?
  30. 30. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is
  31. 31. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is another’s “marry”
  32. 32. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 One site’s “snog” is another’s “marry” Or perhaps even “avoid”
  33. 33. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are lots of thoughts on how to use these
  34. 34. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co There are lots of thoughts on how to use these - many are wrong
  35. 35. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co I’m going to show you a way of simplifying things
  36. 36. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co To pick the right tool for the job
  37. 37. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Know the problem you’re trying to fix!
  38. 38. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Is it a crawl problem? Google isn’t seeing enough of your site
  39. 39. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Or an index problem? Google’s indexing too much of it
  40. 40. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co We fix crawl problems with Robots.txt
  41. 41. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*mad-spider-trap*
  42. 42. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co And we fix index problems with Meta Robots
  43. 43. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/crap-page <meta name=“robots” content=“NOINDEX, FOLLOW”>
  44. 44. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Identifying the problem
  45. 45. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Index problems are simple to spot
  46. 46. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  47. 47. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  48. 48. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  49. 49. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  50. 50. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Does your site look too big? (in Google’s eyes)
  51. 51. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But ID’ing a crawl problem... Can be trickier
  52. 52. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  53. 53. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look for spider traps https://www.portent.com/blog/seo/field-guide-to- spider-traps-an-seo-companion.htm
  54. 54. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Where does this “cost” you on crawl budget?
  55. 55. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co A word on crawl budget.
  56. 56. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It’s “a thing” http://searchengineland.com/google-explain s-crawl-budget-means-webmasters-267597
  57. 57. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But Google doesn’t publicise a site’s crawl budget
  58. 58. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 You can work out a version of it yourself Thanks to Yoast for this - https://yoast.com/crawl-budget-optimization/
  59. 59. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look at GSC Crawl stats average pages crawled per day
  60. 60. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Look at GSC, how big Google sees your site as
  61. 61. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Pages / Avg crawled per day = Crawl Score
  62. 62. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 9,781 / 1,458 = 6.7 x 6.7 more pages than are getting crawled each day
  63. 63. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co If you have 10 You have 10x the pages that Google is crawling daily
  64. 64. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co A pretty big crawl problem!
  65. 65. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But, how big is big?
  66. 66. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 < 1,000 pages, crawl budget is less of a problem https://webmasters.googleblog.com/2017/01/what-crawl-budget-mean s-for-googlebot.html?m=1
  67. 67. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 1,000 - 10,000 is moderate
  68. 68. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 10,000+ pages… things start to get “fun”
  69. 69. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Crawl & Index problems aren’t mutually exclusive
  70. 70. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Index bloat at scale can hurt crawl
  71. 71. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Crawl issues can stop or slow the repair of index issues
  72. 72. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Some example scenarios
  73. 73. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 eCommerce filters which are getting indexed (badly)
  74. 74. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  75. 75. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/shop/mens/trainers/size-12 /red/ <meta name=“robots” content=“NOINDEX, FOLLOW”>
  76. 76. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*size* Disallow: /*red*
  77. 77. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*size* Disallow: /*red*
  78. 78. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Not until index issue is cleared up*
  79. 79. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 *Unless...
  80. 80. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Noindex: /*size* Disallow: /*size* Noindex: /*red* Disallow: /*red*
  81. 81. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Google isn’t “cool” with this https://www.seroundtable.com/google-do-not-use -noindex-in-robots-txt-20873.html
  82. 82. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  83. 83. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co But it’s proved to work https://www.deepcrawl.com/blog/best-practice/robots -txt-noindex-the-best-kept-secret-in-seo/ http://ohgm.co.uk/de-index-pages-blocked-robots-txt/
  84. 84. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Only ~0.3% of the Majestic Million use this method
  85. 85. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Don’t be too aggressive though!
  86. 86. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Be aware that some filtered pages can be worth indexing
  87. 87. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Blog taxonomies
  88. 88. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  89. 89. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/blog/blog-category/ www.domain.com/blog/tags-bloody-tags/ <meta name=“robots” content=“NOINDEX, FOLLOW”>
  90. 90. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 eCommerce site without indexed filters but x6+ crawl score
  91. 91. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*filters* (& meta robots just in case)
  92. 92. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Other misc pages? Just noindex
  93. 93. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Anything else?
  94. 94. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  95. 95. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Back to my original premise
  96. 96. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots is my “marry”
  97. 97. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  98. 98. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt is my “snog”
  99. 99. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co
  100. 100. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It really can make the difference
  101. 101. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta robots replaced with robots.txt disallow:
  102. 102. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 But it is easy to screw it up
  103. 103. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co What about x-robots?
  104. 104. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co For when meta robots isn’t possible…
  105. 105. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co For when meta robots isn’t possible… … assuming you can edit htaccess
  106. 106. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 It’s not my “avoid” though
  107. 107. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 What is?
  108. 108. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co The lazy option!
  109. 109. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /*pointless* Disallow: /*disallow-rules* Disallow: /*instead-of* Disallow: /*fixing-the-problem.html
  110. 110. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co www.domain.com/200000-filtered-combos <meta name=“robots” content=“NOINDEX, NOFOLLOW”>
  111. 111. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co User-agent: * Disallow: /we-should-write-better-content/ #but don’t want to prioritise
  112. 112. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co The “best choice” depends on your limitations
  113. 113. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Do you have all the access you need?
  114. 114. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Or enough buy-in?
  115. 115. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co @chrisgreen87 www.strategiq.co #BrightonSEO 15th September 2017 Otherwise, some workarounds are better than doing nothing
  116. 116. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Meta Robots via GTM https://moz.com/blog/seo-changes-using-google-tag-manager
  117. 117. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Robots.txt when no other option User-agent: * Disallow: /better-than-doing-nothing/
  118. 118. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Key takeaways
  119. 119. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Is it a crawl or index problem?
  120. 120. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Check what you can change
  121. 121. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Check what you can change And what you can’t…
  122. 122. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Make the “best case” fix
  123. 123. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Implement, crawl & check again!
  124. 124. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Use this flowchart to help http://bit.ly/bseo-flow
  125. 125. #BrightonSEO 15th September 2017 @chrisgreen87 www.strategiq.co Thank you. http://bit.ly/snog-marry-avoid @chrisgreen87

×