Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Getting Started with Query Understanding

161 views

Published on

A user types “black clutch” into your search engine. Do they mean the handbag, the automobile part, or something else entirely?

Search is about matching the intent of the user with the information they need. For decades, “relevance” in information retrieval systems has meant things like BM25, TFIDF, field boosting, document boosting, etc. These simple heuristics and strategies have served us well, but ultimately fall short because they fail to semantically model intent. Our systems don’t actually understand what users want, they just hope a few magic numbers will get us close enough.

Query Understanding is about using real intelligence to put users first. In this session, we’ll talk about what Query Understanding is, why it’s important, and some practical strategies for making your search experience smarter.

Published in: Technology
  • Be the first to comment

Getting Started with Query Understanding

  1. 1. Getting Started with Query Understanding Berlin Buzzwords June 11, 2018
  2. 2. Hello
  3. 3. Hello
  4. 4. Hello
  5. 5. Hello
  6. 6. Agenda Hello
  7. 7. Why QU? Agenda
  8. 8. Why QU? What is QU? Agenda
  9. 9. Why QU? What is QU? Problems Agenda
  10. 10. Why QU? What is QU? Problems The Stakes Agenda
  11. 11. Why Query Understanding
  12. 12. Our goal is relevance.
  13. 13. InformationUser IR 101
  14. 14. InformationUser IR 101 “pluto”
  15. 15. How else can we achieve relevance?
  16. 16. Statistical Relevance Why
  17. 17. Statistical Relevance “pluto” Math 24.2
  18. 18. [document] term frequency [corpus] frequencyΣ Statistical Relevance
  19. 19. [document] term frequency [corpus] frequencyΣ Statistical Relevance
  20. 20. Where’s the user in this equation?
  21. 21. “fanny pack”
  22. 22. “bum bag”
  23. 23. Users and documents speak different languages.
  24. 24. Machine-Learned Relevance Why
  25. 25. Machine-Learned Relevance “Fanny pack” 0.8 Document Model
  26. 26. Machine-Learned Relevance “Fanny pack” yes/no Document
  27. 27. Usual caveats apply: - Data is hard. - Training is hard. - Productionizing is hard. Machine-Learned Relevance
  28. 28. “dress shirt”
  29. 29. Nobody uses ranking models to filter spam. Neither should you.
  30. 30. What is Query Understanding
  31. 31. Query Understanding is focusing on queries.
  32. 32. Query Understanding is focusing on intent.
  33. 33. “blue canvas tote” Color Material Category What
  34. 34. “blue canvas tote” color_id:1 material_id:2 category_id: 3 What
  35. 35. What Clothing Shirts Dress Shirts Knowledge “dress shirt”
  36. 36. What Clothing Shirts Dress Shirts Knowledge “dress shirt”
  37. 37. HOLD UP
  38. 38. 70 BILLION
  39. 39. What “ithaca is gorges t- shirt red” “harry potter”
  40. 40. What
  41. 41. “fanny pack”
  42. 42. What Knowledge
  43. 43. Problems
  44. 44. Precision Problems Garbage results Recall Problems Not enough results
  45. 45. Precision Problems Problems
  46. 46. “formation”
  47. 47. “beyonce”
  48. 48. “solange”
  49. 49. “beyond” “beyon” Precision “beyonce” “beyon” “format” “format” “formation” “format” “solana” “solan” “solange” “solan”
  50. 50. Precision Problem: Stemming proper nouns
  51. 51. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions.
  52. 52. Your users don’t care how smart you are.
  53. 53. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions. 2. Medium - Harvest data.
  54. 54. Precision
  55. 55. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions. 2. Medium - Harvest data. 3. Hard - Part of Speech (POS) Tagger, Named Entity Recognition (NER).
  56. 56. Precision
  57. 57. Precision
  58. 58. “dress shirt”
  59. 59. Precision Problem: Not respecting phrases.
  60. 60. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list.
  61. 61. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list. 2. Medium - Heuristic-powered phrase list.
  62. 62. Precision Pointwise Mutual Information (PMI) P(“dress shirt”) P(“dress”) * P(“shirt”)
  63. 63. Precision
  64. 64. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list. 2. Medium - Heuristic-powered phrase list. 3. Hard - Supervised-learning.
  65. 65. Precision dress Phrase!Model shirt
  66. 66. Precision black dress shirt small
  67. 67. Precision black dress shirt small
  68. 68. “dress”
  69. 69. Precision Finished Goods Craft Supplies
  70. 70. Precision Problem: Ambiguous keywords.
  71. 71. Precision Problem: Ambiguous keywords. Solution: Query Classification.
  72. 72. Precision Craft Supplies Finished Goods Knowledge “dress”
  73. 73. Precision Clothing Women’s Dresses Knowledge “dress”
  74. 74. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings.
  75. 75. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings. 2. Medium - Data-powered mappings.
  76. 76. Precision P(Finished Goods | “dress”) = # of Finished Goods clicks for “dress” # of clicks for “dress”
  77. 77. Precision stem(dress) == stem([clothing.womens.]dresses)
  78. 78. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings. 2. Medium - Data-powered mappings. 3. Hard - Supervised-learning.
  79. 79. Precision Craft Supplies Finished Goods Knowledge Model“dress”
  80. 80. Precision
  81. 81. Precision
  82. 82. Recall Problems Problems
  83. 83. “fanny pack”
  84. 84. Recall Clothing Bags & Purses Knowledge “fanny pack” Bum Bags
  85. 85. Recall Clothing Bags & Purses Knowledge “fanny pack” Fanny Pack Bum Bags
  86. 86. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms.
  87. 87. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms. 2. Medium - Existing data sources, i.e. Wordnet, Wikipedia.
  88. 88. Recall
  89. 89. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms. 2. Medium - Existing data sources, i.e. Wordnet, Wikipedia. 3. Hard - Automatically detect synonyms.
  90. 90. Recall
  91. 91. Precision
  92. 92. “red nascar”
  93. 93. “nascar”
  94. 94. Recall Problem: Missing data. Colors Blue Knowledge “red nascar” Red Green
  95. 95. Recall Problem: Missing data. 1. Easy - Ask humans.
  96. 96. Recall Problem: Missing data. 1. Easy - Ask humans. 2. Medium - Heuristics.
  97. 97. Recall
  98. 98. Recall
  99. 99. Recall Problem: Missing structured data. 1. Easy - Ask content creators. 2. Medium - Heuristics. 3. Hard - ML/AI.
  100. 100. Recall red green ...
  101. 101. Patterns Problems
  102. 102. Patterns 1. Easy - Humans.
  103. 103. Patterns
  104. 104. Patterns 1. Easy - Humans. 2. Medium - Data/Heuristics.
  105. 105. Patterns 1. Easy - Humans. 2. Medium - Data/Heuristics. 3. Hard - AI/ML.
  106. 106. Patterns
  107. 107. Patterns Knowledge
  108. 108. Stakes HIGH STAKES
  109. 109. “clutch”
  110. 110. Query Understanding + UX
  111. 111. Autosuggest Facet Ranking Restrict Facets Suggested Refinement Boost Results Filter Results High Confidence Low Confidence
  112. 112. Denouement
  113. 113. Query Understanding helps us achieve a baseline of relevance.
  114. 114. Query Understanding is focusing on intent.
  115. 115. What Knowledge
  116. 116. @giokincade gio@relatedworks.io www.relatedworks.io medium.com/related-works-inc

×