Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Real Problems Behind Indexing | 5 Hours of Technical SEO

148 views

Published on

In his talk at 5 Hours of Technical SEO, organized by SEMRush, Bartosz Góralewicz spoke to Nik Ranger, Cindy Krum, and Will Critchlow about the most common issues preventing large websites from getting indexed by Google.

Published in: Marketing
  • Be the first to comment

The Real Problems Behind Indexing | 5 Hours of Technical SEO

  1. 1. Bartosz Góralewicz Nik Ranger Cindy Krum Will Critchlow ! ! !
  2. 2. Helping Fortune 500's rank better and get more traffic Bartosz Góralewicz @bart_goralewicz www.onely.com We're deeply specialized: Technical SEO JavaScript SEO Rendering SEO (!) Indexing Issues (!) Web Performance Link to this deck -> on the last slide @bart_goralewiczwww.onely.com
  3. 3. #TechnicalSEOconfessions @bart_goralewiczwww.onely.com Let me tell you a personal story
  4. 4. We are researching indexing for a while… @bart_goralewiczwww.onely.com
  5. 5. … truth be told… it gets boring sometimes… @bart_goralewiczwww.onely.com More proxies Proxies Scraping sitemaps Google banning our proxies Websites banning our proxies because we scrape sitemaps More bans, more captchas
  6. 6. ... Locked down in the house, my mind started to play tricks on me… @bart_goralewiczwww.onely.com ? ? ? ?? ?
  7. 7. I felt like Google started to challenge me ? ? ? @bart_goralewiczwww.onely.com
  8. 8. Am I a robot? @bart_goralewiczwww.onely.com ? ? ?
  9. 9. Then I finally understood why SEOs do that.. @bart_goralewiczwww.onely.com … I felt a strong urge to do something that we all hate so much. Sorry:(
  10. 10. @bart_goralewiczwww.onely.com A CORRELATION STUDY!
  11. 11. If it correlates = it’s true! @bart_goralewiczwww.onely.com
  12. 12. I called Tomek, our head of R&D @bart_goralewiczwww.onely.com we have 1 million URLs in our database. AMAZING - Let’s see what correlates – I’ll talk about this at 5 hours of technical SEO. #AmazingContent
  13. 13. @bart_goralewiczwww.onely.com Result: 160 columns filled with data Which one correlates the most with is_indexed?
  14. 14. Results? No correlation. No correlation. No correlation. No correlation. No correlation. @bart_goralewiczwww.onely.com
  15. 15. Long story short… @bart_goralewiczwww.onely.com there is no 1 problem or 1 solution for indexing problems.
  16. 16. than just crawl budget. is way more complex @bart_goralewiczwww.onely.com
  17. 17. on Google=Indexed @bart_goralewiczwww.onely.com
  18. 18. Percentage of URLs NOT indexed 30%76% 15% 81% 14%14% 38%71% 98% @bart_goralewiczwww.onely.com
  19. 19. *sorted by the level of complexity, ascending Every kind of indexing problems comes from different origins and requires different solutions. URL indexing problems Mobile-first related indexing problems JavaScript related indexing problems Layout based indexing problems 4 kinds of indexing problems* @bart_goralewiczwww.onely.com
  20. 20. #SEJSummit @bart_goralewicz Discovery Queue Crawl Rendering Index selectionIndexingRanking How indexing works @bart_goralewiczwww.onely.com
  21. 21. #SEJSummit @bart_goralewicz Discovery Queue Crawl Rendering Index selectionIndexingRanking *please don’t start a Twitter war after this slide  Partial indexing issue = URL not indexed AFTER it was crawled * How indexing works @bart_goralewiczwww.onely.com
  22. 22. @bart_goralewiczwww.onely.com Why is indexing going to be more and more of a problem?
  23. 23. Google’s challenge 2010 2010 2012 2012 2014 2020 @bart_goralewiczwww.onely.com
  24. 24. source: https://twitter.com/methode/status/1261259179983081473 @bart_goralewiczwww.onely.com @bart_goralewiczwww.onely.com
  25. 25. 1-minute crash course Index selection @bart_goralewiczwww.onely.com @bart_goralewiczwww.onely.com
  26. 26. Index selection for dummies SOURCE: Patent Method and apparatus for managing a backlog of pending URL crawls (patent US8676783B1) Limit: 100 people Rendering Links Efficient crawling Content Indexing strategy @bart_goralewiczwww.onely.com
  27. 27. = new challenges index selection+Limited resources @bart_goralewiczwww.onely.com
  28. 28. *sorted by the level of complexity, ascending Every kind of indexing problems comes from different origins and requires different solutions. URL indexing problems Mobile-first related indexing problems JavaScript related indexing problems Layout based indexing problems 4 kinds of indexing problems* @bart_goralewiczwww.onely.com
  29. 29. Let’s start easy with a little @bart_goralewiczwww.onely.com warm up
  30. 30. URL indexing - example one.ly/alba-shoes @bart_goralewiczwww.onely.com
  31. 31. URL indexing - example @bart_goralewiczwww.onely.com one.ly/alba-shoes
  32. 32. Problem with the site: command False negatives @bart_goralewiczwww.onely.com
  33. 33. Site: command new challenges Site:URL – watch out for false negatives* *fortunately, there are a few ways to avoid those and get 100% accuracy @bart_goralewiczwww.onely.com
  34. 34. URL indexing - causes • Thin content • Duplicate content • Cannibalization • Etc. Content quality Crawler budget issues @bart_goralewiczwww.onely.com Index bloat
  35. 35. @bart_goralewiczwww.onely.com PARTIAL INDEXING AHEAD URL indexing
  36. 36. Mobile-first related partial indexing not visible on mobile @bart_goralewiczwww.onely.com
  37. 37. Mobile-first related partial indexing - example one.ly/yoox-pants mobile desktop @bart_goralewiczwww.onely.com
  38. 38. desktop not visible on mobile one.ly/yoox-pants @bart_goralewiczwww.onely.com
  39. 39. @bart_goralewiczwww.onely.com
  40. 40. Diagnosing mobile-first related indexing problems Simple way - Side by side visual comparison1 @bart_goralewiczwww.onely.com
  41. 41. Diagnosing mobile-first related indexing problems Diffchecker 2 @bart_goralewiczwww.onely.com
  42. 42. Make sure that all the content on mobile is on desktop as well. @bart_goralewiczwww.onely.com
  43. 43. Thinking about that new, shiny JS framework? @bart_goralewiczwww.onely.com
  44. 44. JavaScript indexing ≈ 25% trends over time @bart_goralewiczwww.onely.com
  45. 45. We need to talk.Before we move forward... @bart_goralewiczwww.onely.com
  46. 46. Remember those good old times, when only SOME websites were JS-powered? @bart_goralewiczwww.onely.com
  47. 47. In 2020, Wordpress, Magento, Wix, Shopify are usually JS-powered too! DUH! @bart_goralewiczwww.onely.com
  48. 48. Google Hangouts (August 23rd 2019)
  49. 49. JavaScript SEO is not dying. It's getting even more complex Is JavaScript SEO dying? @bart_goralewiczwww.onely.com
  50. 50. or… in simpler terms. @bart_goralewiczwww.onely.com
  51. 51. JavaScript SEO is not dying @bart_goralewiczwww.onely.com
  52. 52. @bart_goralewiczwww.onely.com It is getting even more f..cked up!
  53. 53. JavaScript SEO leveled up over the last years. JS @bart_goralewiczwww.onely.com
  54. 54. JavaScript-related indexing problems - example with JavaScript without JavaScript @bart_goralewiczwww.onely.com
  55. 55. Diagnosing JS-related partial indexing problems @bart_goralewiczwww.onely.com
  56. 56. Diagnosing JS-related partial indexing problems @bart_goralewiczwww.onely.com
  57. 57. INDEXED JavaScript indexing problems = partial indexing @bart_goralewiczwww.onely.com The URL is JavaScript dependent content – NOT INDEXED. How to spot JavaScript indexing problems?
  58. 58. WRS* To understand JS-related indexing problems, we need to look under Google’s hood a bit. @bart_goralewiczwww.onely.com
  59. 59. To understand JS-related indexing problems, we need to look under Google’s hood a bit. WRS* @bart_goralewiczwww.onely.com *Web Rendering Service
  60. 60. Google limits CPU consumption source: Google Webmaster Conference Product Summit, Mountain View, CA http://services.google.com/fh/files/events/wmconf_product_summit_slides_publish.pdf @bart_goralewiczwww.onely.com
  61. 61. Rendering - a search engine's perspective @bart_goralewiczwww.onely.com
  62. 62. Confession time Father, 81% of my content is not indexed @bart_goralewiczwww.onely.com
  63. 63. Browser BOR Browser BORvs source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  64. 64. How Batch- Optimized Rendering works step by step source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  65. 65. Step 1. BOR skips all resources which are not essential to generate a preview of your page Examples: Tracking scripts (Google Analytics, Hotjar etc.) Ads Images* How Batch-optimized rendering works source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  66. 66. vs Browser BOR Load: 4.24s Load: 1.91s @bart_goralewiczwww.onely.com
  67. 67. Set the value of a Virtual Clock Step 2. How Batch-optimized rendering works source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  68. 68. 1. Virtual Clock’s time runs out* 2. Website’s layout is generated *simplification Step 3. How Batch-optimized rendering works source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  69. 69. Using this data to rank better Virtual Clock Layout source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  70. 70. Virtual Clock = Rendering Budget* *simplification @bart_goralewiczwww.onely.com
  71. 71. Rendering pauses while waiting for scripts, CSS files etc. Cost of our website’s rendering A script/CSS heavy website needs more “virtual time” on the virtual clock Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) Virtual Clock @bart_goralewiczwww.onely.com
  72. 72. BOR – a place where real time doesn’t matter. Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  73. 73. Where is the limit? @bart_goralewiczwww.onely.com
  74. 74. How resource-hungry is your website? Superfast CPUSlower CPU @bart_goralewiczwww.onely.com
  75. 75. Measuring the Virtual Clock load* of your website. *Ubersimplification 2 options @bart_goralewiczwww.onely.com
  76. 76. Use TLDR one.ly/tldr Simulate BOR in your Chrome Dev Tools one.ly/bor Detailed walkthrough @bart_goralewiczwww.onely.com
  77. 77. Virtual clock’s time runs out the LAYOUT is generated Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) @bart_goralewiczwww.onely.com
  78. 78. @bart_goralewiczwww.onely.com
  79. 79. @bart_goralewiczwww.onely.com
  80. 80. Pre-layout times <!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta http-equiv="X-UA- Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial- scale=1"/> <script>ss_u = false; ss_g_var = 'no'; if (document.cookie.indexOf("scroll0=") > -1 && window.location.href.indexOf('no_scroll') === -1 ) { ss_u = true; ss_g_var = 'yes'; }</script> <link rel="preconnect" href="https://c.amazon-adsystem.com" /><link rel="preconnect" href="https://aax.amazon-adsystem.com" /><link rel="preconnect" href="https://adserver- us.adtech.advertising.com" /><link rel="preconnect" href="https://as-sec.casalemedia.com" /><link rel="preconnect" href="https://ib.adnxs.com" /><link rel="preconnect" href="https://ap.lijit.com" /><link rel="preconnect" href="https://vap2sfo1.lijit.com" /><link rel="preconnect" href="https://g2.gumgum.com" /><link rel="preconnect" href="https://tag.1rx.io" /><link rel="preconnect" href="https://btlr.sharethrough.com" /><link rel="preconnect" href="https://dmx.districtm.io" /><link rel="preconnect" href="https://apex.go.sonobi.com" /><link rel="preconnect" href="https://hb.emxdgt.com" /><link rel="preconnect" href="https://biddr.brealtime.com" /><link rel="preconnect" href="https://web.hb.ad.cpe.dotomi.com" /><link rel="preconnect" href="https://s3.amazonaws.com" /><link rel="preconnect" href="https://a.teads.tv" /><link rel="preconnect" href="https://c.aaxads.com" /><link rel="preconnect" href="https://s.aaxads.com" /><link rel="preconnect" href="https://related.searchenginejournal.com" /><link rel="preconnect" href="https://cpm.webtradingspot.com" /><link rel="preconnect" href="https://cdn.jsdelivr.net" /><link rel="preconnect" href="https://adservice.google.com" /><link rel="preconnect" href="https://tpc.googlesyndication.com" /><link rel="preconnect" href="https://www.googletagservices.com" /><link rel="preconnect" href="https://pagead2.googlesyndication.com" /><link rel="preconnect" href="https://pubads.g.doubleclick.net" /><link rel="preconnect" href="https://www.google.com" /><link rel="preconnect" href="https://googleads4.g.doubleclick.net" /><link rel="preconnect" href="https://cdn.adnxs.com" /><link rel="preconnect" href="https://www.google-analytics.com" /><link rel="preconnect" href="https://www.googletagservices.com" /><link rel="preconnect" href="https://connect.facebook.net" /><link rel="dns-prefetch" href="https://platform.twitter.com" /><link rel="dns-prefetch" href="https://www.youtube.com" /><link rel="preconnect" href="https://cdn.searchenginejournal.com" /> <script>width_param = 'large'; if( window.innerWidth < 1024 ){ width_param = 'small'; Before 2011 rendering After 2011 Google Panda Content quality updates @bart_goralewiczwww.onely.com
  81. 81. Layout vs. Rendering new findings @bart_goralewiczwww.onely.com
  82. 82. A lot of focus on… layout. Source: BOR patents (2012 -2018) @bart_goralewiczwww.onely.com
  83. 83. text appearing above-the-fold (e.g., visible without scrolling) may be considered more important than text below-the-line.” Content location matters source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1) „ @bart_goralewiczwww.onely.com
  84. 84. Patent on Scheduling resource crawls (filed in 2011) The importance of the section is based on (...) prominence of the section within the rendered layout. Source: Patent Scheduling resource crawls (US20130144858A1) ads ads „ @bart_goralewiczwww.onely.com
  85. 85. @bart_goralewiczwww.onely.com
  86. 86. (…) link positioned under the “More Top Stories” heading on the cnn.com has a high probability of being selected. „ Some sections may get more “Link Juice”* from Google *Wink, Wink John Mu ;) source: Google patent Ranking documents based on user behavior and/or feature data (US10152520B1) @bart_goralewiczwww.onely.com
  87. 87. Google seems to struggle with indexing “related items”, “you may also be interested in”. @bart_goralewiczwww.onely.com
  88. 88. ..more findings @bart_goralewiczwww.onely.com
  89. 89. Going even more beyond JavaScript… Ekhm… @bart_goralewiczwww.onely.com
  90. 90. …which brings us to the 4th kind of partial indexing problems @bart_goralewiczwww.onely.com
  91. 91. http://one.ly/target @bart_goralewiczwww.onely.com
  92. 92. URL is indexed @bart_goralewiczwww.onely.com
  93. 93. @bart_goralewiczwww.onely.com
  94. 94. @bart_goralewiczwww.onely.com
  95. 95. Not indexed @bart_goralewiczwww.onely.com
  96. 96. Patent on Scheduling resource crawls (filed in 2011) The importance of the section is based on (...) prominence of the section within the rendered layout. Source: Patent Scheduling resource crawls (US20130144858A1) ads ads „ @bart_goralewiczwww.onely.com
  97. 97. @bart_goralewiczwww.onely.com …all those partial indexing problems are not THAT serious.
  98. 98. … but
  99. 99. Let’s recap first
  100. 100. *sorted by the level of complexity, ascending Every kind of indexing problems comes from different origins and requires different solutions. Mobile-first related indexing problems JavaScript related indexing problems Layout based indexing problems Every kind of indexing problem* URL indexing problems @bart_goralewiczwww.onely.com
  101. 101. @bart_goralewiczwww.onely.com How are indexing problems killing your traffic?
  102. 102. Let’s investigate Target.com again. @bart_goralewiczwww.onely.com
  103. 103. Js + mobile Quality Quality Indexed @bart_goralewiczwww.onely.com
  104. 104. Shipping info @bart_goralewiczwww.onely.com
  105. 105. Main content @bart_goralewiczwww.onely.com
  106. 106. Patent on Scheduling resource crawls (filed in 2011) The importance of the section is based on (...) prominence of the section within the rendered layout. Source: Patent Scheduling resource crawls (US20130144858A1) ads ads „ @bart_goralewiczwww.onely.com
  107. 107. 87% 62,24% @bart_goralewiczwww.onely.com https://www.target.com/p/nhl-chicago-blackhawks-checkers-game/-/A-54589615 87%
  108. 108. @bart_goralewiczwww.onely.com 43,04% https://www.target.com/p/nhl-chicago-blackhawks-checkers-game/-/A-54589615
  109. 109. @bart_goralewiczwww.onely.com https://www.target.com/p/nhl-chicago-blackhawks-checkers-game/-/A-54589615 87% Js + mobile Indexed 87% 62,24%
  110. 110. @bart_goralewiczwww.onely.com 43,04% https://www.target.com/p/nhl-chicago-blackhawks-checkers-game/-/A-54589615 Quality
  111. 111. @bart_goralewiczwww.onely.com Summary
  112. 112. @bart_goralewiczwww.onely.com Summary ? ?
  113. 113. Summary @bart_goralewiczwww.onely.com
  114. 114. Let’s talk about the results… @bart_goralewiczwww.onely.com
  115. 115. Indexed content = rankings. @bart_goralewiczwww.onely.com
  116. 116. THANK YOU www.onely.com

×