Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Google Works: A Ranking Engineer's Perspective By Paul Haahr

140,752 views

Published on

From the SMX West Conference in San Jose, California, March 1-3, 2016. SESSION: How Google Works: A Google Ranking Engineer's Story. PRESENTATION: How Google Works - Given by Paul Haahr, @haahr - Google, Software Engineer. #SMX #33A

Published in: Marketing

How Google Works: A Ranking Engineer's Perspective By Paul Haahr

  1. 1. How Google Works A Ranking Engineer’s Perspective Paul Haahr SMX West March 3, 2016
  2. 2. Google Search Today
  3. 3. Mobile First
  4. 4. Features • spelling suggestions • autocomplete • related searches • related questions • calculator • knowledge graph • answers • featured snippets • maps • images • videos • in-depth articles • movie showtimes • sports scores • weather • flight status • package tracking • …
  5. 5. Ranking
  6. 6. 10 Blue Links
  7. 7. What documents do we show? What order do we show them in?
  8. 8. Life of a Query
  9. 9. Two Parts of a Search Engine • Ahead of time (before the query) • Query processing
  10. 10. Before the Query • Crawl the web • Analyze the crawled pages • Extract links • Render contents • Annotate semantics • … • Build an index
  11. 11. The Index • Like the index of a book • For each word, a list of pages it appears on • Broken up into groups of millions of pages • At Google, these are called “shards” • 1000s of shards for the web index • Plus per-document metadata
  12. 12. Query Processing • Query understanding and expansion • Retrieval and scoring • Post-retrieval adjustments
  13. 13. Query Understanding • Does the query name any known entities? • [san jose convention center] • [matt cutts] • Are there useful synonyms? • [gm trucks]: “gm” → “general motors” • [gm corn]: “gm” → “genetically modified” • Context matters
  14. 14. Retrieval and Scoring • Send the query to all the shards • Each shard • Finds matching pages • Computes a score for query+page • Sends back the top N pages by score • Combine all the top pages • Sort by score
  15. 15. Post-retrieval adjustments • Host clustering, sitelinks • Is there too much duplication? • Spam demotions, manual actions • …
  16. 16. What do ranking engineers do? (version 1) Write code for those servers
  17. 17. Scoring Signals
  18. 18. Signal • A piece of information used in scoring • Query independent – feature of page • PageRank, language, mobile friendliness, ... • Query dependent – feature of page & query • keyword hits, synonyms, proximity, …
  19. 19. What do ranking engineers do? (version 2) Look for new signals. Combine old signals in new ways.
  20. 20. Metrics
  21. 21. “If you can not measure it, you can not improve it.” –Lord Kelvin (sort of)
  22. 22. Key Metrics • Relevance • Does a page usefully answer the user’s query? • Ranking’s top-line metric • Quality • How good are the results we show? • Time to result (faster is better) • ...
  23. 23. Higher results matter • “Position weighed” • “Reciprocally ranked” metrics • Position 1 is worth 1 • Position 2 is worth ½ • Position 3 is worth ⅓ • Position 4 is worth ¼ • …
  24. 24. What do ranking engineers do? (version 3) Optimize for our metrics
  25. 25. But where do the metrics come from?
  26. 26. Evaluation
  27. 27. How do we measure ourselves? • Live Experiments • Human Rater Experiments
  28. 28. Live Experiments
  29. 29. Live Experiments • A/B experiments on real traffic • Similar to what many other websites do • Look for changes in click patterns • Harder to understand than you might expect • A lot of traffic is in one experiment or another
  30. 30. Interpreting Live Experiments • Both pages P1 and P2 answer user’s need • For P1, answer is on the page • For P2, answer is on the page and in the snippet • Algorithm A puts P1 before P2 ⇒ user clicks on P1 ⇒ “good” • Algorithm B puts P2 before P1 ⇒ no click ⇒ “bad” • Do we really think A is better than B?
  31. 31. Human Rater Experiments
  32. 32. Human Rater Experiments • Show real people experimental search results • Ask how good the results are • Ratings aggregated across raters • Published guidelines explain criteria for raters • Tools support doing this in an automated way
  33. 33. Result Rating Task
  34. 34. Two Scales • Needs Met • Does this page address the user’s need? • Our current relevance metric • Page Quality • How good is the page?
  35. 35. Mobile First
  36. 36. Mobile First Rating “Needs Met rating tasks ask [raters] to focus on mobile user needs and think about how helpful and satisfying the result is for the mobile users.”
  37. 37. How do we make it mobile-centric? • More mobile queries than desktop in samples • Pay attention to user’s location • Tools display mobile user experience • Raters visit websites on smartphones
  38. 38. Needs Met Rating
  39. 39. Needs Met Rating • Fully Meets • Highly Meets • Moderately Meets • Slightly Meets • Fails to Meets (Following examples are from Rater Guidelines)
  40. 40. Fully Meets
  41. 41. (Very) Highly Meets
  42. 42. Highly Meets
  43. 43. (More) Highly Meets
  44. 44. Moderately Meets
  45. 45. Slightly Meets
  46. 46. Fails to Meet
  47. 47. Page Quality Rating
  48. 48. Page Quality Concepts • Expertise • Authoritativeness • Trustworthiness
  49. 49. High Quality Pages • A satisfying amount of high quality main content • The page and website are expert, authoritative, and trustworthy for the topic of the page • The website has a good reputation for the topic of the page
  50. 50. Low Quality Pages • The quality of the main content is low • There is an unsatisfying amount of main content • The author does not have expertise or is not trustworthy or authoritative for the topic • The website has a negative reputation • The secondary content is distracting or unhelpful
  51. 51. Optimizing Our Metrics
  52. 52. Ranking engineers • Team of a few hundred computer scientists • Focused on our metrics and signals • Run lots of experiments • Make lots of changes
  53. 53. Development Process • Idea • Repeat until ready: • Write code • Generate data • Run experiments • Analyze • Launch report by Quantitative Analyst • Launch review
  54. 54. What do ranking engineers do? (version 4) Move results with good ratings up. Move results with bad ratings down.
  55. 55. What Goes Wrong? (And how do we fix it?)
  56. 56. Two kinds of problems • Systematically bad ratings • Metrics don’t capture things we care about
  57. 57. Bad Ratings
  58. 58. [texas farm fertilizer] • User is looking for a brand of fertilizer • Unlikely to want to go to the manufacturer’s headquarters • Rater average called map of headquarters almost “Highly Meets”
  59. 59. Patterns of Losses • Look for things we think are bad in results • Either live or from experiments • Create examples for rater guidelines
  60. 60. New rater example
  61. 61. Missing Metrics
  62. 62. Low Quality Content in 2009-2011 • Lots of complaints about low quality content • But our relevance metric kept going up • Low quality pages can be very relevant • We thought we were doing great • ⇒ We weren’t measuring what we needed to
  63. 63. Quality Metric • Gets directly at the quality issue • Not the same as relevance • Enabled development of quality-related signals
  64. 64. When the Metrics Miss Something
  65. 65. What do ranking engineers do? (version 5) Fix rater guidelines or develop new metrics (when necessary)
  66. 66. Thank you!
  67. 67. Questions?

×