Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BrightonSEO - How to deal with Rankbrain and IA in SEO?

1,937 views

Published on

How to deal with Rankbrain and IA in SEO?
Conference made by Lionel Kappelhoff, VP Customer Success at OnCrawl during last BrightonSEO (April, 2018).

Published in: Data & Analytics
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

BrightonSEO - How to deal with Rankbrain and IA in SEO?

  1. 1. BEYOND TECHNICAL SEO, HOW TO DEAL WITH RANKBRAIN AND AI IN SEO @OnCrawl BrightonSEO 2018
  2. 2. #seocamp FranCoisGoube OncrawlCEO&founder SEOExpert AdvisorforFrenchVC firms Serialentrepreneur & GOOD IDEAS Speaker today
  3. 3. Menu • Introduction • Crawl, indexation, ranking and AI principles • The tables of the law of Data Exploits • Technical SEO Methodology to make you a winner
  4. 4. #seocamp How a search engine works Crawl 1 32 RankIndex discover organize respond AI re-ranking
  5. 5. #seocamp How does Google work? Google's algorithms are computer programs designed to navigate through billions of pages, find the right clues and send you exactly the answer to your question https://www.google.com/search/howsearchworks/ The life of a query begins long before your capture, by exploring and indexing the billions of documents that make up the Web
  6. 6. Google consumes annually as much energy as the city of San Francisco journaldugeek 12/12/2016
  7. 7. #seocamp Google Crawl budget Are the resources $ that Google setup to crawl your website optimized?
  8. 8. #seocamp What Google says about crawl budget If you observe that new pages are usually explored the same day they are published, then you don't really have to worry about the exploration budget […] if a site has less than a few thousand URLs, it will be browsed correctly most of the time […] we do not have a single term to describe everything this term seems to mean on the outsidene
  9. 9. #seocamp 100% of the sites in GSC have exploration data! Tracking its "Crawl Behavior" through the analysis of its logs can quickly detect an anomaly in the bot’s behavior Does crawl's budget is related to the ranking, visits? The more the index is updated, the more Google knows that the page fit « the best response to a query »
  10. 10. #seocamp How to organize a crawl effectively Define a Crawl Budget to spend prioritize the urls to explore according to important (and changing) factors Schedule : importants pages first Adapt the budget according to needs to reduce costs Optimize Behind this method there are only algorithms they use the quality data of your site to make choices Mobile first is comming!
  11. 11. #seocamp Monitor your crawl frequency Analyze your log files It’s a sanity indicator regarding your SEO Create alerts when crawl frequencies drop or increase
  12. 12. Your log files contain the only data that accurately reflects how search engines browse your website - MOZ Blog
  13. 13. #seocamp To optimize crawl budget spending it is necessary to return the most beautiful data The algorithms will « decide » according to your technical quality, popularity and semantics scores and user beahviors We know that some metrics are more important than others to trigger the increase in crawl frequency Understand that algorithms use variables
  14. 14. #seocamp Indexation How Google choose to index a page or not? Internal metrics • Page Quality Volume of Content / keywords Topics and Entity detection Title, Hn, Schema.org, … Payload, OnCrawl InRank® External metrics • Page Authority Page Rank Majestic Trust & Citation Flow
  15. 15. #seocamp Return the most beautiful data! The algorithms will decide according to your semantic alignment, semantic quality, content quality Web Scalling principes : • use massive data interpretation Natural Language analysis Word Embedding • use massive data corelation The web is full of entities • The ontology also includes all the semantic that describe the relationships between terms or between named entities Optimize Indexation
  16. 16. #seocamp Rankings Once indexed, Google scores each page of your website There are many different metrics that are computed Pagerank, Quality scores, Website Trust… Google aggregates many attributes to each page Meta Data, Title, Schema.org, N-Grams, Payload, content interpretation
  17. 17. #seocamp Return the most beautiful data! The algorithms will decide according to your semantic alignment, semantic quality, content quality Web Scalling principes : • use massive data agregation : human / computed • use massive data interrelationship / qualification : human / computed Optimize Ranking
  18. 18. #seocamp Algorithms, Data aggregation, human/supervised validation, massive usage of data models, knowledge and interrelationship analysis Machine Learning can be a part of the process by adding multiple interation cycles Test and learn! This is Artificial Intelligence
  19. 19. #seocamp
  20. 20. #seocamp Everybody works for Google
  21. 21. This is how works Machine Learning
  22. 22. #seocamp Interpretation of the request Matching with the knowledge base Assumption of what the Internet user is looking for (Context) Does the user like the result? Yes Perfect, I keep my ranking I'll try a new ranking next time.No What we know About Re-ranking
  23. 23. And so what?
  24. 24. You need to help Google !
  25. 25. #seocamp You need to help Google Why? How? Maximize exploration sessions and point the bot (and users!) in the right direction • Reduce errors • Rectify technical issues • Reinforce your content • Create depth shortcuts • Organize linking by objectives • Speed up the web site More pages/freq. in less of time 🧐
  26. 26. #seocamp A query-independent score (also called a document score) is computed for each URL by URL page rankers The page rankers compute a page importance score for a given URL […] the page importance score is computed by considering not only the number of URLs that reference a given URL but also the page importance score of such referencing URLs Page importance Patent US20110179178
  27. 27. #seocamp Page importance Patent US20110179178 Page importance score data is provided to URL managers, which pass a page importance score for each URL robots, and content processing servers One example of a page importance score is PageRank, which is used the page importance metric used in the Google search engine The Crawl rate is SEO data driven
  28. 28. #seocamp Page importance Can be optimized by playing on the right metrics § depth and page localisation in the site § Page Rank – Majestic Trust Flow, Citation Flow § internal Page Rank – OnCrawl InRank § type of document: PDF, HTML, TXT § sitemap.xml inclusion § quality/spread of anchors § number of words, few near duplicate § parents page importance From the excellent Dawn Anderson article on SearchEngine Land https://patents.google.com/patent/US8042112B1/en?q=(page)&q=(importance)&q=url&q=schedulin g&assignee=Google+Inc.&oq=(page)+(importance)+assignee:(Google+Inc.)+url+scheduling
  29. 29. And so what?
  30. 30. #seocamp What we know Google does not like digging too deep into a site
  31. 31. #seocamp Google is sensitive to the volume of content What we know
  32. 32. #seocamp Google is sensitive to Internal popularity - OnCrawl InRank® What we know More links = best positions More links = best crawl freq.
  33. 33. #seocamp Google is sensitive to the CTR and Bounce Rates What we know Less BR = more Bot Hits Best CTR = best positions
  34. 34. #seocamp To remember Your priority pages should be linked from the home, or ideally 1-2 level from the home To be crawled frequently a page must be fast and have a wonderful content … Return the most beautiful data!
  35. 35. #seocamp Indexation and Content Interpretation
  36. 36. #seocamp What we know Google classifies types of request: • Transactional • Informational • Navigation Depending on the type of request to position, Google will more or less crawl you Google does not understand the content, but it seeks to understand the concepts by using detection of named entities : • it creates term and word weight matrices • It deduces page themes for the ranking
  37. 37. #seocamp Transactional Here the user wants to get to a website where there will be more interaction Pages on “converse men's chuck taylor” cold content, transactional request à average crawl frequency A well thought linking based on brand/product entity relationships
  38. 38. #seocamp Informational This is when the user is looking for a specific bit of information Pages on “Clinton” or “Trump” to a hot “informational” subject à High crawl frequency Need to use powerful linking to pages, use of named entities
  39. 39. #seocamp Navigational The user is looking to reach a particular website Pages on “BrightonSEO speakers” are know as part of Brightonseo website à Law crawl frequency Need to use good semantic SEO optimization, Trust and Citation
  40. 40. #seocamp Word Embeddings is the collective name for a set of language modeling and feature learning techniques in natural language processing – NLP - where words or phrases from the vocabulary are mapped to vectors of real numbers
  41. 41. #seocamp NLP as Rankbrain's foundation Our (Rankbrain) algorithm is able to represent strings of text in very high- dimensional space and “see” how they relate to another
  42. 42. #seocamp Automated Language & Rankbrain Processing Google maintains a knowledge base on named entities and understands the relationships between entities:
  43. 43. #seocamp Word Embeddings In a search engine algorithm, a tool is needed to calculate a "similarity" score between two documents This note is strategic for creating a relevant ranking, but it is used in combination with a very large number of other signals, rather major (like the popularity of the page), or minor (like the presence of a keyword in the page url)
  44. 44. #seocamp Each entity or concept is vectorized For the machines to understand
  45. 45. #seocamp Google can then evaluate the distance between two concepts Entity #1 Entity #1 vectorized distances related entities vectorized distances related entities
  46. 46. #seocamp For each entity, Google knows: Entity#1 Sentences that contain the entity In which context / topic the entity is used Often used with entity #2 in a paragraph Often used with entity #2 in a site on the subject Often used with entity #2 on the same page
  47. 47. For my SEO? WTF?
  48. 48. #seocamp Concretely? how old is the wife of bill gates ? Assumption of a request on age Type of relation = wife Individual / Personality
  49. 49. #seocamp An example: “Apple” Phone Computer Brand /Apple Local store /”apple”
  50. 50. #seocamp Phone Computer Brand /Apple Local store / ”apple” It is entity detection that infers the context of the search and refines the results An example: “Apple”
  51. 51. #seocamp To remember Google will tend to crawl pages by “package” • On the same path (Discover) • On the same topic (Recrawl) The more content it will met with expected entities (related to the theme), the deeper it will crawl The type of entity, its rarity or popularity will directly impact the crawl frequency Internal linking must be thought from the relationship between entities present in your content
  52. 52. #seocamp How to check my named entities? You can do it for each URL in the OnCrawl Toolbox Now integrated in the OnCrawl crawl reports!
  53. 53. how to maximize your efforts
  54. 54. #seocamp Understand your website Map your website by: • Type of content • Pages categories Understand which entities are present in my content • Anchors Texts Analyze how pages with entities are linked in my site
  55. 55. #seocamp Which steps? Crawl your site Categorize your pages Extract named entities by page group Identify pages with/without entities à Adjust your content Monitor the number of words per group of pages / per packet The goal is to define the “ideal content metrics” to maximize your crawlability Data Explorer export Comparison with log data Filter by group Average Crawled pages by Google Ignored pages Number of pages concerned 875 256 1 340 872 Number of words 897 457 Number of entities in the content 18 3 Number of entities in anchors 6 0
  56. 56. #seocamp Quickwins ü Use named entities in your link anchors ü Create packages of linked pages according to entity typology ü Example of a media site:
  57. 57. #seocamp OnCrawl detects entities
  58. 58. Follow rankings and cross data with OnCrawl Rankings
  59. 59. #seocamp Conclusions Crawling, indexing, ranking and re-ranking are all based on artificial intelligence and machine learning principes It is not so intelligent because they need us to validate models Never forget that they are only algorithms: you have to know and manipulate the metrics they take into account to manipulate them!
  60. 60. #seocamp Crawling consumes energy, simplify bots life : pay attention to depth, navigation shortcuts, No Duplicate and especially load time, weight in the current index mobile firtst context Follow crawl budget with your logs! Conclusions
  61. 61. #seocamp Indexing is based on internal/external metrics content Knowledge Graph is the Google learning base on named entities Understand: NLP – Word Embeddings Conclusions
  62. 62. #seocamp Ranking: it is the consistency of all these data with user intentions it depends on quality scores (technical) relevance scores (semantic) + knowledge of the user’s behaviors and intention his personal background of research and visit Conclusions
  63. 63. #seocamp Make users want to come back to manipulate the CTR and BR Favorise titles, meta desc, content, speed, UX/UI Conclusions
  64. 64. www.oncrawl.com OnCrawl help e-commerce & online media take better SEO decisions and grow their revenues By providing access to the Most Advanced SEO Software Semantic SEO Crawler Comprehensive Log Analyser API & Platform to combined all website’s data
  65. 65. Try OnCrawl for free Start your free trial

×