Successfully reported this slideshow.
Your SlideShare is downloading. ×

Alexis + Max - We Love SEO 19 - Bot X

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 94 Ad

More Related Content

Slideshows for you (17)

Similar to Alexis + Max - We Love SEO 19 - Bot X (20)

Advertisement
Advertisement

Alexis + Max - We Love SEO 19 - Bot X

  1. 1. @AlexisKSanders Creating a bot experience as good as your user experience Alexis Sanders / Max Prin
  2. 2. @AlexisKSanders part 1: improving user & bot relations améliorer les relations utilisateurs/robots
  3. 3. @AlexisKSanders you are not a robot. (probably) tu n’es pas un robot. (probablement)
  4. 4. @AlexisKSanders we can’t do math calculations as fast, on ne peut pas calculer aussi vite,
  5. 5. @AlexisKSanders I eat cereal I eat numbers for breakfast.
  6. 6. @AlexisKSanders we're more likely to miss something in large datasets, on est plus susceptible de manquer quelque chose dans beaucoup de données
  7. 7. @AlexisKSanders yeah, apparently robots are better than 5 year olds… oui, apparemment les robots sont meilleurs que des enfants de 5 ans…
  8. 8. @AlexisKSanders we aren’t able to work 24/7, on ne peut pas travailler 24h sur 24, 7 jours sur 7
  9. 9. @AlexisKSanders (real talk: my irobot has done more cleaning than anyone in my home…) real data mon iRobot a nettoyer plus que n’importe qui d’autre chez moi
  10. 10. @AlexisKSanders and that’s okay. et c’est OK.
  11. 11. @AlexisKSanders it’sokayto not bearobot.
  12. 12. @AlexisKSanders human vs. bot complex tasks calculations creativity, imagination, language endless loops heuristic analysis filtering big data better with people better with other bots
  13. 13. @AlexisKSanders all of this is to say humans and bot are different tout ça pour dire que les humains et les robots sont different
  14. 14. @AlexisKSanders diversity is a good thing diversité est une bonne chose
  15. 15. @AlexisKSanders we can use both of our strengths to create better work. on peut combiner nos forces pour plus d’efficacité
  16. 16. @AlexisKSanders for today, i have two themes: 1. large sites 2. automated testing
  17. 17. @AlexisKSanders theme I: large sites
  18. 18. @AlexisKSanders >= 100,000 pages
  19. 19. @AlexisKSanders large sites are interesting in SEO, because they have same issues, but at a larger scale
  20. 20. @AlexisKSanders 6 SEO challenges (especially for large sites): crawling crawl efficiency indexing rendering unique content awareness
  21. 21. @AlexisKSanders how can we address these?
  22. 22. @AlexisKSanders crawling: logs and log visualization tools
  23. 23. @AlexisKSanders logs show a site’s interactions with bots 2019-10-03 00:00:00 0.0.0.7 GET /knock- knock
  24. 24. @AlexisKSanders let’s start with exploratory testing what is that…. that is job security.
  25. 25. @AlexisKSanders what are we looking for (by user-agent)? • anomalies • segment by folder • crawling rates • http response codes
  26. 26. @AlexisKSanders we must answer: 1. are bots crawling your site in a way you’d expect (and want) them to? 2. are your top KPI-driving pages being crawled?
  27. 27. @AlexisKSanders resulting changes may include: • more internal linking • removing dead internal links • resolving status codes • canonicalizing page sets (e.g., pdfs) • bots crawling non-existent pages • pages no one knew existed
  28. 28. @AlexisKSanders XML sitemaps
  29. 29. @AlexisKSanders action: break up XML sitemaps in meaningful ways for a human to later analyze (and submit those individual in GSC).
  30. 30. @AlexisKSanders effect of a meaningful XML sitemap Relative effect of the treatment showed an increase of +33%. The 95% confidence interval of this percentage is [ 19.0% , 45.0% ]. The probability of this effect being caused by chance is small therefore it is statistically significant.
  31. 31. @AlexisKSanders crawl efficiency
  32. 32. @AlexisKSanders crawling efficiency: o important pages close to root? o no crawl traps? o no orphan pages? o all pages have a purpose? o duplicate content? o redirects consolidated? o canonical tags? o no useless parameters?
  33. 33. @AlexisKSanders crawling & indexing: GSC index coverage reports
  34. 34. @AlexisKSanders there is a wealth of information hidden within the “excluded” section =
  35. 35. @AlexisKSanders tip: go through each section of the “excluded” coverage reports, identify any themes
  36. 36. @AlexisKSanders bonus life tip: have or make a master list of all URLs on the site »«
  37. 37. @AlexisKSanders why a master list of all URLs? 1. site migration 2. auditing 3. knowing/agreeing on what’s priority 4. to identify what is not being crawled & indexed 5. automation
  38. 38. @AlexisKSanders to make a master URL list: o crawlers o XML sitemap o GSC o analytics platform o dev team o google SERP
  39. 39. @AlexisKSanders rendering
  40. 40. @AlexisKSanders we must answer: 1. JS to load important content? 2. performance data when changes are implemented? 3. added solutions? 4. (bonus) are images important?
  41. 41. @AlexisKSanders how to tell if your content is being rendered? 1. check direct quotes in SERP 2. use google’s mobile-friendly testing tool 3. check the DOM (Inspect > Element)
  42. 42. @AlexisKSanders chart by Eric Wu (@eywu) on best solution for JS concerns
  43. 43. @AlexisKSanders content b/c unique content is hard to do at scale
  44. 44. @AlexisKSanders 1. prioritize (by value to your core users) 2. delegate towards strengths: • programmed = simple (maybe API) data input/output • humans = on people + relationships tips:
  45. 45. @AlexisKSanders effect of adding computer-generated text Relative effect of the treatment showed an increase of +22% The 95% confidence interval of this percentage is [ 13.0% , 30.0% ]. The probability of this effect being caused by chance is small therefore it is statistically significant
  46. 46. @AlexisKSanders chatbots Relative effect of the treatment showed an increase of +22% The probability of this effect being caused by chance is high therefore it is not statistically significant Relative effect of the treatment showed an increase of +78% The probability of this effect being caused by chance is small therefore it is statistically significant
  47. 47. @AlexisKSanders +1 robots for scalability +1 humans for emotional connection
  48. 48. @AlexisKSanders theme II: automation
  49. 49. @AlexisKSanders primary question: is the site 24/7 doing what’s expected?
  50. 50. @AlexisKSanders what is important to monitor: • robots.txt • status codes • http redirects live • meta robots (noindex) • canonical • XML sitemap • title tags • meta description
  51. 51. @AlexisKSanders solutions: white/grey box testing – dev/QA team black box testing – everyone else
  52. 52. @AlexisKSanders automated black box unit testing for SEO: leveraging tools or scripts that do basic SEO checks at preset intervals
  53. 53. @AlexisKSanders tools: little warden custom scripts visual ping selenium + jenkins uptime robot
  54. 54. @AlexisKSanders one last point, always check robots with robots.
  55. 55. @AlexisKSanders robots.txt is confusing use technicalseo.com/tools/robots-txt/ it still crawls robots.txt and the disallow is ignored.
  56. 56. @AlexisKSanders to conclude (my part…),
  57. 57. @AlexisKSanders human + robot working together = happier human webmaster and user the bot doesn’t care … it’s a robot
  58. 58. @AlexisKSanders Merci beaucoup! @AlexisKSanders
  59. 59. @AlexisKSanders APPENDIX • What’s here? Well, basically a bunch of complaints random thoughts in a rant constructive format about robots.txt and why everyone I (personally) find it so confusing intellectually stimulating.
  60. 60. @AlexisKSanders things I find confusing about robots.txt • allow versus disallow hierarchy of more specific • [undefined] verdicts, what does Google even do… then • how Google ad bot doesn’t follow the rules • implied * at end of every line • implied .com at beginning of every line • how $ and * are in robots.txt, but they’re not the same as regex • the whole noindex header on robots.txt being accepted, then ignored… why… • https://www.robotstxt.org/, the whole site • how we can only use robots.txt if URL structure makes sense • how disallowing the robots.txt is just ignored (it’s so meta) • when sites overuse robots.txt • why Google automatically crawls your blocked pages if the robots.txt goes down • how robots.txt is case sensitive (it’s so close… and yet… so far)
  61. 61. @AlexisKSanders Part 2 PWA + AMP = PWAMP @maxxeight
  62. 62. @maxxeight <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Person", "name": "Max Prin", "url": "https://maxprin.com", "jobTitle": "Head of Technical SEO", "worksFor": "Merkle", "sameAs": [ "https://twitter.com/maxxeight", "https://www.linkedin.com/in/maxprin" ] } </script> @maxxeight
  63. 63. @maxxeight Une experience utilisateur… que Google peut comprendre. • Pertinent • Mobile-Friendly • Rapide • Securisé • Populaire Un bon référencement naturel est basé sur: • Contenu • Web Design • Vitesse du site • SSL/HTTPS • Liens @maxxeight
  64. 64. @maxxeight Une experience utilisateur… que Google peut comprendre. • Pertinent • Mobile-Friendly • Rapide • Securisé • Populaire Un bon référencement naturel est basé sur: • Contenu • Web Design • Vitesse du site • SSL/HTTPS • Liens @maxxeight
  65. 65. @maxxeight@maxxeight
  66. 66. @maxxeight@maxxeight
  67. 67. @maxxeight@maxxeight
  68. 68. @maxxeight@maxxeight
  69. 69. @maxxeight Contenu “caché” <picture> and srcset Onglets Accordéons “Lire la suite” Quand chargé automatiquement (vs. clique de l’utilisateur) Design adapté optimisé pour mobiles <picture> <source type="image/svg+xml" srcset="pyramid.svg"> <source type="image/webp" srcset="pyramid.webp"> <img src="pyramid.png" alt="large PNG image..."> </picture> <img srcset="example-320w.jpg 320w, example-480w.jpg 480w, example-800w.jpg 800w" sizes="(max-width: 320px) 280px, (max-width: 480px) 440px, 800px" src="example-800w.jpg" alt="responsive web!"> @maxxeight
  70. 70. @maxxeight Native apps Web apps @maxxeight
  71. 71. @maxxeight Fiable et rapide Engageant • Mobile-Friendly • Rapide* • Securisé (HTTPS) Progressive Web Apps • Icône (home) • Notifications (push) @maxxeight
  72. 72. @maxxeight Exploration et rendu • Googlebot “à la page” • Lazy Loading • Contenu “onclick” • + liens (mega menu, etc.) Progressive Web Apps @maxxeight
  73. 73. @maxxeight Lazy Loading (Images) @maxxeight
  74. 74. @maxxeight Accelerated Mobile Pages AMP HTML+JS AMP Cache (CDN) Pre-loading @maxxeight
  75. 75. @maxxeight L’URL AMP ”Réelle” avec les Échanges Signés @maxxeight
  76. 76. @maxxeight Javascript Personnalisé dans AMP avec <amp-script> Restrictions • 10 000 octets maximum par <amp-script> • 150 000 octets maximum pour le total des <amp-script> dans la page @maxxeight
  77. 77. @maxxeight AMP pour le E-Commerce <amp-sidebar> - Navigation <amp-carousel> <amp-list> - Organisation des produits <amp-form> - Recherche <amp-bind> - Filtrage et tri <amp-access> - Connection <amp-accordion> - Images/details <amp-form> <amp-carousel> - Commentaires/avis <amp-selector> - Onglets/Vignettes <amp-bind> - Selection couleur/taille <amp-state> - Ajouter au panier @maxxeight
  78. 78. @maxxeight Achat/Paiements dans AMP PaymentRequest API - Seulement avec Chrome <amp-form> - Information (nom, addresse, etc.) mais pas de paiement Rediriger les visiteurs vers le site… @maxxeight
  79. 79. @maxxeight AMP Français m. Français m. Anglais Desktop Français Desktop Anglais rel="amphtml" rel="amphtml" rel="amphtml" rel="canonical" rel="amphtml" rel="canonical" rel="canonical" rel="alternate" rel="canonical" rel="alternate" rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" AMP Anglais @maxxeight
  80. 80. @maxxeight Desktop Français Desktop Anglais rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" @maxxeight
  81. 81. @maxxeight AMP: Acquisition Initiale PWA: Interactivité/Engagement • SERP • Chargement instantané • Fonctionnalités limitées Comment obtenir le meilleur des deux? • Fonctionnalités avancées • Dynamique • Lent la 1ere visite @maxxeight
  82. 82. @maxxeight Combiner AMP et PWA AMP comme point d'entrée dans la PWA @maxxeight AMP comme source de données pour la PWA AMP avec des fonctionnalités PWA
  83. 83. @maxxeight AMP comme point d'entrée dans la PWA <amp-install-serviceworker> @maxxeight
  84. 84. @maxxeight@maxxeight
  85. 85. @maxxeight AMP et PWA avec les même URLs? @maxxeight
  86. 86. @maxxeight AMP Français m. Français m. Anglais rel="amphtml" rel="amphtml" rel="amphtml" rel="canonical" rel="amphtml" rel="canonical" rel="canonical" rel="alternate" rel="canonical" rel="alternate" rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" rel="alternate" hreflang=”fr" rel="alternate" hreflang="en" AMP Anglais Desktop Français Desktop Anglais @maxxeight rel="alternate" hreflang=”fr" rel="alternate" hreflang="en"
  87. 87. @maxxeight self.addEventListener('fetch', event => { if (event.request.mode === 'navigate') { event.respondWith( fetch(event.request.url + '?pwa=true') ); } else { event.respondWith( caches.match(event.request).then(function(response) { return response || fetch(event.request); }) ); } }); service-worker.js @maxxeight
  88. 88. @maxxeight RewriteEngine on RewriteCond %{QUERY_STRING} pwa=true [OR] RewriteCond %{HTTP_REFERER} ^https://pwamp.site/.* [OR] RewriteCond %{HTTP_REFERER} ^https://pwamp-site.cdn.ampproject.org/.* RewriteRule (.*) /pwa.php [L] .htaccess @maxxeight
  89. 89. @maxxeight@maxxeight
  90. 90. @maxxeight Les page AMP ne sont pas rendues • Bots voient seulement <amp-img> (vs. <img>) • Pas d’access à l’URL dans src=“” • • Utilisez <noscript> Les images AMP ne sont pas indexable @maxxeight
  91. 91. @maxxeight “Et pour le SEO?” Les robots ne crawlent/indexent que la version AMP Pas de gaspillage des resources (crawler plusieurs URLs avec le meme contenu) Signaux clairs (pas de sourcis de balise canonical/alternate) Les pages rapide et pre-chargées dans la SERP (AMP viewer ou “URL Réelle”) Les pages sont “adaptées aux mobiles” @maxxeight
  92. 92. @maxxeight PWAMP - Examples et Resources https://pwamp.site https://www.howpwampworks.com (by @aleyda) @maxxeight
  93. 93. @maxxeightTechnicalSEO.com
  94. 94. @AlexisKSanders Merci! @maxxeight @maxxeight

Editor's Notes

  • https://www.techspot.com/news/75939-ai-powered-facial-recognition-robot-zaps-fun-where.html
  • https://www.theverge.com/circuitbreaker/2018/8/8/17665268/wheres-waldo-finding-robot-google-cloud-automl-ai
  • https://hbr.org/2018/07/collaborative-intelligence-humans-and-ai-are-joining-forces
  • http://news.asiantown.net/r/13962/robot-suit-gives-super-strength-to-the-elderly-in-japan
  • https://www.flaticon.com/authors/gregor-cresnar
    https://www.flaticon.com/authors/smalllikeart
  • oncrawl screamingfrog log analyzer botify deepcrawl
  • (e.g., part of URLs, /#, URL like strings from html)
  • https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
    https://google.github.io/CausalImpact/CausalImpact.html


    Fix to only be before the pre/post the next closest update

    sessions
  • https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
    https://google.github.io/CausalImpact/CausalImpact.html


    Fix to only be before the pre/post the next closest update
    Update to organic visits
  • https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
    https://google.github.io/CausalImpact/CausalImpact.html


    Fix to only be before the pre/post the next closest update
    Update to organic visits


    Chat leads, referred chat leads
  • https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
    https://google.github.io/CausalImpact/CausalImpact.html


    Fix to only be before the pre/post the next closest update
    Update to organic visits
  • http://www.quickmeme.com/p/3w4cjf
  • https://webmasters.googleblog.com/2018/01/using-page-speed-in-mobile-search.html
    https://webmaster-fr.googleblog.com/2018/01/vitesse-chargement-pages-critere-positionnement.html
  • Why is the reach of web apps higher?
    Search engines (vs. app stores).
    Supported by all major browsers
    Low cost of acquisition
  • Capabilities
    Reliable and Fast
    App shell cached locally (on 1st load): Fast loading when offline or with slow connection (on subsequent loads)
    Mobile-friendly (responsive)
    Secure (HTTPS)
    Engaging
    App icon on device’s home screen
    Push notifications

    Technically, any website can easily be turned into a PWA (service-worker + manifest)
    But in general, a web app, a site built with a JS framework is the best candidate but become a PWA.
  • Building a web app to be fast: lazy loading, api based content (user click to load)
    Refer to JS SEO Best practices
    But expand on lazy loading – intersectionObserver – but lazyload attribute
  • intersectionObserver
    <noscript>
    lazyload attribute: https://mathiasbynens.be/demo/img-loading-lazy
  • AMP is fast for a lot of reasons that, technically, can be replicated outside of the AMP framework (lazy loading, limited JS, CDN, etc.)
    BUT what AMP has that ”normal” pages don’t have is the pre-loading in the SERP (AMP viewer)
    If Google start pre-rendering the ”10 blue links”, then AMP has not reason to be.

    https://amp.dev/about/how-amp-works/
    https://medium.com/@cramforce/why-amp-is-fast-7d2ff1f48597
    Lazy loading
    Extensive use of preconnect
    Prefetching of lazy loaded resources
    All async JavaScript
    Inline style sheets
    Zero HTTP requests block font downloads.
    Instant loading through prerendering
    Prerendering only downloads resources above the fold
    Prerendering does not render things that might be expensive in terms of CPU
    Intelligent resource prioritization
    Uncoupling of document layout from resource downloads
    Maximum size for style sheet
    FastDOM-style DOM change batching
    Optimized for low count of style recalculations and layout
    Mitigations for third party JS worst-practices such as document.write
    Runtime cost of analytics instrumentation is independent of number of used analytics providers
    Extensions don’t block page layout
    CDN delivery available to all AMP documents
    All resources and the document are loaded from the same origin through the same HTTP 2.0 tunnel
    Animations can be GPU accelerated
  • https://amp.dev/documentation/guides-and-tutorials/optimize-and-measure/signed-exchange/
    https://support.cloudflare.com/hc/en-us/articles/360029367652-Understanding-Amp-Real-URL
  • https://amp.dev/documentation/components/amp-script/
    <amp-iframe>
  • https://amp.dev/documentation/examples/e-commerce/amp_for_e-commerce_getting_started/
  • ttps://amp.dev/documentation/examples/e-commerce/amp_for_e-commerce_getting_started/
    https://amp.dev/documentation/examples/e-commerce/payments_in_amp/
  • https://amp.dev/documentation/examples/guides/internationalization/
  • Best of both and 1 URL?
  • https://amp.dev/documentation/guides-and-tutorials/learn/combine-amp-pwa/

    https://amp.dev/documentation/guides-and-tutorials/optimize-and-measure/amp_to_pwa/
  • https://amp.dev/documentation/guides-and-tutorials/integrate/amp-to-pwa/
  • User gets the AMP from the SERP
    Service worker is installed on device
    Once activated, SW caches the “app shell” and initial data

    User clicks on a (internal) link
    Service worker “hijacks” the click
    Pre-cached PWA loads instantly
  • https://amp.dev/documentation/guides-and-tutorials/integrate/amp-to-pwa/
  • ServiceWorker “hijacks” the click – Server handles the rest
  • ServiceWorker “hijacks” the click – Server handles the rest
  • Google and search engines only get the AMP version of your URLs/pages
    - Not the canonical or “normal” URL where images (img + src) can be found

    https://amp.dev/documentation/guides-and-tutorials/develop/media_iframes_3p/
    https://amp.dev/documentation/guides-and-tutorials/optimize-and-measure/server-side-rendering/
  • Bots only crawl/index the AMP version of the site

    No waste of crawling resources over multiple URLs for the same content

    Clear signaling (i.e. don’t worry about all of those canonical/alternate tags)

    Pages are fast and pre-loaded in the SERP (AMP viewer or “Real URL”)

    Pages are mobile-friendly

×