Successfully reported this slideshow.
Your SlideShare is downloading. ×

Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021


Check these out next

1 of 52 Ad

More Related Content

Slideshows for you (20)

Similar to Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021 (20)


Recently uploaded (20)

Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021

  1. 1. Crawling & Indexing for JavaScript Heavy Sites {TametheBots} @davewsmart #brightonSEO
  2. 2. In the beginning ... There was the web. It served flat documents, and the SEOs said it was good {TametheBots} @davewsmart #brightonSEO
  3. 3. But then we wanted to do stuff ... We wanted to buy things, blog, chat on forums. {TametheBots} @davewsmart #brightonSEO So dynamic server side scripting was born, and SEOs were like “Cool!”
  4. 4. So then we wanted to interact ... Right in the Browser! We wanted to break away from flat documents. {TametheBots} @davewsmart #brightonSEO JavaScript was born, and the SEOs said ...
  6. 6. Which is kinda understandable Search engines took a while to catch up But the universe is different now* *For some search engines! {TametheBots} @davewsmart #brightonSEO
  7. 7. Evergreen to the rescue! Google used to render with an OLD browser. In May ‘19 it got a BIG upgrade It now runs up-to-date Chromium! {TametheBots} @davewsmart #brightonSEO
  8. 8. Evergreen to the rescue! Now executes modern JavaScript! No more testing against Chrome 41! {TametheBots} @davewsmart #brightonSEO
  9. 9. Evergreen to the rescue! What about Bing? October ‘19, Evergreen too! {TametheBots} @davewsmart #brightonSEO
  10. 10. So it works just like my browser? Nearly, but not quite … The goals are different. {TametheBots} @davewsmart #brightonSEO
  11. 11. So it works just like my browser? Humans want to see some pages. Search engines want to gather {TametheBots} @davewsmart #brightonSEO billions
  12. 12. How do they do that? By accessing the web differently, fetching only what they need, somewhat at their own leisure. {TametheBots} @davewsmart #brightonSEO
  13. 13. How do they do that? {TametheBots} @davewsmart #brightonSEO From:
  14. 14. Myths about the render queue It’s not days, it’s mins. Pretty much everything goes through the WRS. {TametheBots} @davewsmart #brightonSEO
  15. 15. Some ground rules A page needs a proper URL Googlebot needs <a href=""> to get around. {TametheBots} @davewsmart #brightonSEO
  16. 16. Some ground rules Non-200, or pages with noindex will not get rendered. {TametheBots} @davewsmart #brightonSEO
  17. 17. Some ground rules JavaScript, and API resources cannot be blocked by robots.txt (if you want them to work) {TametheBots} @davewsmart #brightonSEO
  18. 18. So, what’s the fuss about? Some is about nothing. But there are potential lumps in the custard. {TametheBots} @davewsmart #brightonSEO
  19. 19. Permission Denied! Service Workers Location Notifications ... {TametheBots} @davewsmart #brightonSEO
  20. 20. Websockets? Nope! Not designed to provide initial content. Googlebot will tell you it does support, but then fail anyway. {TametheBots} @davewsmart #brightonSEO
  21. 21. Web Workers? Kinda, Mostly! Great for loading off heavy processes. But … some unpredictable behaviour, especially if it performs fetches. {TametheBots} @davewsmart #brightonSEO
  22. 22. Solution? Fail Gracefully! Make sure the important content loads anyway. {TametheBots} @davewsmart #brightonSEO
  23. 23. How to test? ● Mobile-Friendly Test ● Rich Results Test ● URL inspection tool (live) {TametheBots} @davewsmart #brightonSEO
  24. 24. We need to talk about caching Google cache “aggressively” They probably won’t listen to your Cache-Control headers {TametheBots} @davewsmart #brightonSEO
  25. 25. We need to talk about caching It is what allows their scale, and it’s a good thing for you. Less fetched = more budget to fetch the good stuff. {TametheBots} @davewsmart #brightonSEO
  26. 26. But you need to work with it Images, CSS, JavaScript & API crawls can all be cached. {TametheBots} @davewsmart #brightonSEO
  27. 27. But you need to work with it Some things it doesn’t matter for. Some things it does! {TametheBots} @davewsmart #brightonSEO
  28. 28. JavaScript files ● Your site needs /app.js ● You update /app.js {TametheBots} @davewsmart #brightonSEO
  29. 29. JavaScript files ● Googlebot tries the new page with the old /app.js {TametheBots} @davewsmart #brightonSEO
  30. 30. Who ya gonna call? Cache-busters! ● “Fingerprint” your files. ● /app.5787ee49.js {TametheBots} @davewsmart #brightonSEO
  31. 31. Who ya gonna call? Cache-busters! Many frameworks & bundlers can do this for you! {TametheBots} @davewsmart #brightonSEO
  32. 32. What about CSS? ● Less critical ● But can cause mobile usability issues {TametheBots} @davewsmart #brightonSEO
  33. 33. What about CSS? Same cache-busting solution {TametheBots} @davewsmart #brightonSEO
  34. 34. What about APIs? If your content is loaded in via an API call, this can be cached too. You have a decision to make. {TametheBots} @davewsmart #brightonSEO
  35. 35. What about APIs? Is freshness actually needed? {TametheBots} @davewsmart #brightonSEO
  36. 36. What about APIs? Something like related products? It might not, so perhaps just let it cache. {TametheBots} @davewsmart #brightonSEO
  37. 37. What about when it matters? If freshness does matter ... Timestamp the call i.e: /api/latest-news?ts=123456 POST not GET {TametheBots} @davewsmart #brightonSEO
  38. 38. How to test? Live tools are made to bypass cache Use URL Inspection Tool (not the live test) Look at the rendered HTML {TametheBots} @davewsmart #brightonSEO
  39. 39. In your search console {TametheBots} @davewsmart #brightonSEO
  40. 40. Is the content up to date? Is the content there? Does the content look as up to date as the last crawl date? {TametheBots} @davewsmart #brightonSEO
  41. 41. I’m a human, not a browser! Ok, HTML isn’t always easy to read. Click the copy button, then go to that page. {TametheBots} @davewsmart #brightonSEO
  42. 42. I’m a human, not a browser! Open devTools, right click on <html> in the elements panel, Select Edit as HTML {TametheBots} @davewsmart #brightonSEO Hat tip to Oliver H.G. Mason (@ohgm), he's also mentioned this solution, in a less clumsy way at: spection-tool/#view-tested-page
  43. 43. I’m a human, not a browser! Select all the code in the box, paste the code from URL Inspection tool & Enter. {TametheBots} @davewsmart #brightonSEO The page should now be as Google rendered it.
  44. 44. Measure it in your log files! No changes mean gbot is hitting mostly pages / robots.txt {TametheBots} @davewsmart #brightonSEO
  45. 45. Measure it in your log files! I pushed a change, with new filenames. All the resources! {TametheBots} @davewsmart #brightonSEO
  46. 46. Some final thoughts The dev team knows and cares about users, they might not know about googlebot. {TametheBots} @davewsmart #brightonSEO
  47. 47. Some final thoughts Be their friendly guide, not their nemesis. {TametheBots} @davewsmart #brightonSEO
  48. 48. Some final thoughts I am not a JavaScript salesman. Sometimes JavaScript isn’t the best way. If a pure HTML solution is better & you can advocate for it, do! {TametheBots} @davewsmart #brightonSEO
  49. 49. Some final thoughts TheOldWays™ are still valid! {TametheBots} @davewsmart #brightonSEO
  50. 50. Some final thoughts The web is growing though! JavaScript hasn’t killed the document web, it’s added to it. As SEOs, we might be called on to support that. {TametheBots} @davewsmart #brightonSEO
  51. 51. Some great resources: Google’s Dev Docs for JavaScript related SEO: Opening devTools: SEO Mythbusters Video on JavaScript: Martin Splitt’s JavaScript Hangouts, ask live questions! Keep an eye out here: {TametheBots} @davewsmart #brightonSEO
  52. 52. Bye Bye! I’ve been Dave Smart, sorry about that :D You can reach me at @davewsmart on twitter, or find me at Ta-Ta For Now! {TametheBots} @davewsmart #brightonSEO