Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Getting Started with
Query Understanding
Berlin Buzzwords
June 11, 2018
Hello
Hello
Hello
Hello
Agenda
Hello
Why QU?
Agenda
Why QU?
What is QU?
Agenda
Why QU?
What is QU?
Problems
Agenda
Why QU?
What is QU?
Problems
The Stakes
Agenda
Why Query
Understanding
Our goal is relevance.
InformationUser
IR 101
InformationUser
IR 101
“pluto”
How else can we achieve relevance?
Statistical
Relevance
Why
Statistical Relevance
“pluto”
Math 24.2
[document] term frequency
[corpus] frequencyΣ
Statistical Relevance
[document] term frequency
[corpus] frequencyΣ
Statistical Relevance
Where’s the user in this
equation?
“fanny pack”
“bum bag”
Users and documents
speak different languages.
Machine-Learned
Relevance
Why
Machine-Learned Relevance
“Fanny pack”
0.8
Document
Model
Machine-Learned Relevance
“Fanny pack”
yes/no
Document
Usual caveats apply:
- Data is hard.
- Training is hard.
- Productionizing is hard.
Machine-Learned Relevance
“dress shirt”
Nobody uses ranking
models to filter spam.
Neither should you.
What is Query
Understanding
Query Understanding is
focusing on queries.
Query Understanding is
focusing on intent.
“blue canvas tote”
Color
Material
Category
What
“blue canvas tote”
color_id:1
material_id:2
category_id:
3
What
What
Clothing
Shirts
Dress
Shirts
Knowledge
“dress shirt”
What
Clothing
Shirts
Dress
Shirts
Knowledge
“dress shirt”
HOLD UP
70 BILLION
What
“ithaca is gorges t-
shirt red”
“harry potter”
What
“fanny pack”
What
Knowledge
Problems
Precision Problems
Garbage results
Recall Problems
Not enough results
Precision
Problems
Problems
“formation”
“beyonce”
“solange”
“beyond” “beyon”
Precision
“beyonce” “beyon”
“format” “format”
“formation” “format”
“solana” “solan”
“solange” “solan”
Precision
Problem: Stemming proper nouns
Precision
Problem: Stemming proper nouns
1. Easy - Human-powered exceptions.
Your users don’t care how
smart you are.
Precision
Problem: Stemming proper nouns
1. Easy - Human-powered exceptions.
2. Medium - Harvest data.
Precision
Precision
Problem: Stemming proper nouns
1. Easy - Human-powered exceptions.
2. Medium - Harvest data.
3. Hard - Part of S...
Precision
Precision
“dress shirt”
Precision
Problem: Not respecting phrases.
Precision
Problem: Not respecting phrases.
1. Easy - Human-powered phrase list.
Precision
Problem: Not respecting phrases.
1. Easy - Human-powered phrase list.
2. Medium - Heuristic-powered phrase
list.
Precision
Pointwise Mutual Information (PMI)
P(“dress shirt”)
P(“dress”) * P(“shirt”)
Precision
Precision
Problem: Not respecting phrases.
1. Easy - Human-powered phrase list.
2. Medium - Heuristic-powered phrase
list....
Precision
dress
Phrase!Model
shirt
Precision
black dress shirt small
Precision
black dress shirt small
“dress”
Precision
Finished Goods Craft Supplies
Precision
Problem: Ambiguous keywords.
Precision
Problem: Ambiguous keywords.
Solution: Query Classification.
Precision
Craft
Supplies
Finished
Goods
Knowledge
“dress”
Precision
Clothing
Women’s
Dresses
Knowledge
“dress”
Precision
Problem: Ambiguous keywords.
1. Easy - Human-powered mappings.
Precision
Problem: Ambiguous keywords.
1. Easy - Human-powered mappings.
2. Medium - Data-powered mappings.
Precision
P(Finished Goods | “dress”) =
# of Finished Goods clicks for “dress”
# of clicks for “dress”
Precision
stem(dress) == stem([clothing.womens.]dresses)
Precision
Problem: Ambiguous keywords.
1. Easy - Human-powered mappings.
2. Medium - Data-powered mappings.
3. Hard - Supe...
Precision
Craft
Supplies
Finished
Goods
Knowledge
Model“dress”
Precision
Precision
Recall
Problems
Problems
“fanny pack”
Recall
Clothing
Bags & Purses
Knowledge
“fanny pack”
Bum Bags
Recall
Clothing
Bags & Purses
Knowledge
“fanny pack”
Fanny
Pack
Bum Bags
Recall
Problem: Nomenclature mismatch.
1. Easy - Human-powered synonyms.
Recall
Problem: Nomenclature mismatch.
1. Easy - Human-powered synonyms.
2. Medium - Existing data sources, i.e.
Wordnet, ...
Recall
Recall
Problem: Nomenclature mismatch.
1. Easy - Human-powered synonyms.
2. Medium - Existing data sources, i.e.
Wordnet, ...
Recall
Precision
“red nascar”
“nascar”
Recall
Problem: Missing data.
Colors
Blue
Knowledge
“red nascar”
Red
Green
Recall
Problem: Missing data.
1. Easy - Ask humans.
Recall
Problem: Missing data.
1. Easy - Ask humans.
2. Medium - Heuristics.
Recall
Recall
Recall
Problem: Missing structured data.
1. Easy - Ask content creators.
2. Medium - Heuristics.
3. Hard - ML/AI.
Recall
red
green
...
Patterns
Problems
Patterns
1. Easy - Humans.
Patterns
Patterns
1. Easy - Humans.
2. Medium - Data/Heuristics.
Patterns
1. Easy - Humans.
2. Medium - Data/Heuristics.
3. Hard - AI/ML.
Patterns
Patterns
Knowledge
Stakes
HIGH STAKES
“clutch”
Query Understanding + UX
Autosuggest
Facet
Ranking
Restrict
Facets
Suggested
Refinement
Boost
Results
Filter
Results
High
Confidence
Low
Confidence
Denouement
Query Understanding
helps us achieve a
baseline of relevance.
Query Understanding is
focusing on intent.
What
Knowledge
@giokincade
gio@relatedworks.io
www.relatedworks.io
medium.com/related-works-inc
Getting Started with Query Understanding
Getting Started with Query Understanding
Upcoming SlideShare
Loading in …5
×

Getting Started with Query Understanding

198 views

Published on

A user types “black clutch” into your search engine. Do they mean the handbag, the automobile part, or something else entirely?

Search is about matching the intent of the user with the information they need. For decades, “relevance” in information retrieval systems has meant things like BM25, TFIDF, field boosting, document boosting, etc. These simple heuristics and strategies have served us well, but ultimately fall short because they fail to semantically model intent. Our systems don’t actually understand what users want, they just hope a few magic numbers will get us close enough.

Query Understanding is about using real intelligence to put users first. In this session, we’ll talk about what Query Understanding is, why it’s important, and some practical strategies for making your search experience smarter.

Published in: Technology
  • Be the first to comment

Getting Started with Query Understanding

  1. 1. Getting Started with Query Understanding Berlin Buzzwords June 11, 2018
  2. 2. Hello
  3. 3. Hello
  4. 4. Hello
  5. 5. Hello
  6. 6. Agenda Hello
  7. 7. Why QU? Agenda
  8. 8. Why QU? What is QU? Agenda
  9. 9. Why QU? What is QU? Problems Agenda
  10. 10. Why QU? What is QU? Problems The Stakes Agenda
  11. 11. Why Query Understanding
  12. 12. Our goal is relevance.
  13. 13. InformationUser IR 101
  14. 14. InformationUser IR 101 “pluto”
  15. 15. How else can we achieve relevance?
  16. 16. Statistical Relevance Why
  17. 17. Statistical Relevance “pluto” Math 24.2
  18. 18. [document] term frequency [corpus] frequencyΣ Statistical Relevance
  19. 19. [document] term frequency [corpus] frequencyΣ Statistical Relevance
  20. 20. Where’s the user in this equation?
  21. 21. “fanny pack”
  22. 22. “bum bag”
  23. 23. Users and documents speak different languages.
  24. 24. Machine-Learned Relevance Why
  25. 25. Machine-Learned Relevance “Fanny pack” 0.8 Document Model
  26. 26. Machine-Learned Relevance “Fanny pack” yes/no Document
  27. 27. Usual caveats apply: - Data is hard. - Training is hard. - Productionizing is hard. Machine-Learned Relevance
  28. 28. “dress shirt”
  29. 29. Nobody uses ranking models to filter spam. Neither should you.
  30. 30. What is Query Understanding
  31. 31. Query Understanding is focusing on queries.
  32. 32. Query Understanding is focusing on intent.
  33. 33. “blue canvas tote” Color Material Category What
  34. 34. “blue canvas tote” color_id:1 material_id:2 category_id: 3 What
  35. 35. What Clothing Shirts Dress Shirts Knowledge “dress shirt”
  36. 36. What Clothing Shirts Dress Shirts Knowledge “dress shirt”
  37. 37. HOLD UP
  38. 38. 70 BILLION
  39. 39. What “ithaca is gorges t- shirt red” “harry potter”
  40. 40. What
  41. 41. “fanny pack”
  42. 42. What Knowledge
  43. 43. Problems
  44. 44. Precision Problems Garbage results Recall Problems Not enough results
  45. 45. Precision Problems Problems
  46. 46. “formation”
  47. 47. “beyonce”
  48. 48. “solange”
  49. 49. “beyond” “beyon” Precision “beyonce” “beyon” “format” “format” “formation” “format” “solana” “solan” “solange” “solan”
  50. 50. Precision Problem: Stemming proper nouns
  51. 51. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions.
  52. 52. Your users don’t care how smart you are.
  53. 53. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions. 2. Medium - Harvest data.
  54. 54. Precision
  55. 55. Precision Problem: Stemming proper nouns 1. Easy - Human-powered exceptions. 2. Medium - Harvest data. 3. Hard - Part of Speech (POS) Tagger, Named Entity Recognition (NER).
  56. 56. Precision
  57. 57. Precision
  58. 58. “dress shirt”
  59. 59. Precision Problem: Not respecting phrases.
  60. 60. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list.
  61. 61. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list. 2. Medium - Heuristic-powered phrase list.
  62. 62. Precision Pointwise Mutual Information (PMI) P(“dress shirt”) P(“dress”) * P(“shirt”)
  63. 63. Precision
  64. 64. Precision Problem: Not respecting phrases. 1. Easy - Human-powered phrase list. 2. Medium - Heuristic-powered phrase list. 3. Hard - Supervised-learning.
  65. 65. Precision dress Phrase!Model shirt
  66. 66. Precision black dress shirt small
  67. 67. Precision black dress shirt small
  68. 68. “dress”
  69. 69. Precision Finished Goods Craft Supplies
  70. 70. Precision Problem: Ambiguous keywords.
  71. 71. Precision Problem: Ambiguous keywords. Solution: Query Classification.
  72. 72. Precision Craft Supplies Finished Goods Knowledge “dress”
  73. 73. Precision Clothing Women’s Dresses Knowledge “dress”
  74. 74. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings.
  75. 75. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings. 2. Medium - Data-powered mappings.
  76. 76. Precision P(Finished Goods | “dress”) = # of Finished Goods clicks for “dress” # of clicks for “dress”
  77. 77. Precision stem(dress) == stem([clothing.womens.]dresses)
  78. 78. Precision Problem: Ambiguous keywords. 1. Easy - Human-powered mappings. 2. Medium - Data-powered mappings. 3. Hard - Supervised-learning.
  79. 79. Precision Craft Supplies Finished Goods Knowledge Model“dress”
  80. 80. Precision
  81. 81. Precision
  82. 82. Recall Problems Problems
  83. 83. “fanny pack”
  84. 84. Recall Clothing Bags & Purses Knowledge “fanny pack” Bum Bags
  85. 85. Recall Clothing Bags & Purses Knowledge “fanny pack” Fanny Pack Bum Bags
  86. 86. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms.
  87. 87. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms. 2. Medium - Existing data sources, i.e. Wordnet, Wikipedia.
  88. 88. Recall
  89. 89. Recall Problem: Nomenclature mismatch. 1. Easy - Human-powered synonyms. 2. Medium - Existing data sources, i.e. Wordnet, Wikipedia. 3. Hard - Automatically detect synonyms.
  90. 90. Recall
  91. 91. Precision
  92. 92. “red nascar”
  93. 93. “nascar”
  94. 94. Recall Problem: Missing data. Colors Blue Knowledge “red nascar” Red Green
  95. 95. Recall Problem: Missing data. 1. Easy - Ask humans.
  96. 96. Recall Problem: Missing data. 1. Easy - Ask humans. 2. Medium - Heuristics.
  97. 97. Recall
  98. 98. Recall
  99. 99. Recall Problem: Missing structured data. 1. Easy - Ask content creators. 2. Medium - Heuristics. 3. Hard - ML/AI.
  100. 100. Recall red green ...
  101. 101. Patterns Problems
  102. 102. Patterns 1. Easy - Humans.
  103. 103. Patterns
  104. 104. Patterns 1. Easy - Humans. 2. Medium - Data/Heuristics.
  105. 105. Patterns 1. Easy - Humans. 2. Medium - Data/Heuristics. 3. Hard - AI/ML.
  106. 106. Patterns
  107. 107. Patterns Knowledge
  108. 108. Stakes HIGH STAKES
  109. 109. “clutch”
  110. 110. Query Understanding + UX
  111. 111. Autosuggest Facet Ranking Restrict Facets Suggested Refinement Boost Results Filter Results High Confidence Low Confidence
  112. 112. Denouement
  113. 113. Query Understanding helps us achieve a baseline of relevance.
  114. 114. Query Understanding is focusing on intent.
  115. 115. What Knowledge
  116. 116. @giokincade gio@relatedworks.io www.relatedworks.io medium.com/related-works-inc

×