Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inside Search
Metro, NYC
November 15, 2017
Hello
Hello
Hello
Hello
Hello
EXERCISE
Exercise
Tell me about you.
1. What’s your position?
2. What kind of organization are you
from?
3. Why are you interested ...
Agenda
don’t worry
1. IR Basics
2. Search in the Modern Era
Agenda
Information
Retrieval 101
InformationUser
IR 101
InformationUser
IR 101
“pluto” Documents
IR 101
pluto?
Inverted Index
Inverted Index
1
2
3
“pluto and goofy...”
“pluto the dwarf planet...”
“stranger things...”
“pluto and goofy”
pluto
and
goofy
Inverted Index
Inverted Index
1pluto 2
planet 2
stranger 3
Inverted Index
1pluto 2
planet 2
stranger 3
Inverted Index
1 2
Ranking
Term
Frequency
Inverse
Document
Frequency
Term Frequency
Document Frequency
1
2
“pluto and goofy...”
“pluto the dwarf planet...pluto”
Ranking
1
2
Term Frequency = 1
Term Frequency = 2
Document Frequency = 2
Ranking
1
2
TFIDF = 1/2 = 0.5
TFIDF = 2/2 = 1
Ranking
1
2
“pluto and goofy...”, TFIDF = 0.5
“pluto the dwarf planet...pluto”, TFIDF = 1
Ranking
IR 101
Index Time Query Time
Forward Index
Tokenize
Inverted Index
Matching
Ranking
Multiple Terms
IR 101
Multiple Terms
“pluto planet”
“pluto planet”
pluto
planet
Multiple Terms
Multiple Terms
1pluto 2
planet 2
stranger 3
Multiple Terms
OR AND
Multiple Terms
1pluto 2
planet 2
1 2
Multiple Terms
Term Frequency
Document Frequency∑
Multiple Terms
score = tf-idfpluto + tf-
idfplanet
1
2
“pluto and goofy...”
“pluto the dwarf planet...”
Multiple Terms
1
2
tf-idfpluto = ½ tf-idfplanet = 0/2
tf-idfpluto = ½ tf-idfplanet = 1/1
Multiple Terms
Document Frequencypluto = 2
Docum...
1
2
½ + 0 = 0.5
½ + 1 = 1.5
Multiple Terms
1
2
“pluto and goofy...”, TFIDF = 0.5
“pluto the dwarf planet...”, TFIDF = 1.5
Multiple Terms
Measurement
IR 101
Measurement
Precision Recall
How good are the results? How many of the good
documents are in the results?
Measurement
Precision Recall
# Good Results
# Results
# Good Results
# Good Documents
Measurement
What’s good?
2
1
“pluto and goofy…”
“pluto the dwarf planet...”
Measurement
“pluto planet”
5
1
“Pluto lost its status…”
“pluto the dwarf planet...”
Measurement
“pluto”
Measurement
Precision Recall
EXERCISE
Exercise
Find a precision and a recall problem.
Analysis
IR 101
5
1
“Pluto lost its status…”
“pluto the dwarf planet...”
Analysis
“pluto”
“Pluto lost its status...”
Pluto
lost
its
status
Analysis
Pluto
lost
its
status
Analysis
pluto
lost
its
status
Analysis
Analysis
1. Tokenize
2. Transform each Token
Analysis
What happens if I search for Pluto?
1pluto 2
Analysis
Index Time Query Time
Forward Index
TokenizeAnalysis
Inverted Index
TokenizeAnalysis
Matching
Ranking
Analysis
What happens if I search for planets?
1planet 2
Analysis
planetsplanets
Analysis
precisionrecall
stemming
lemmatization
aggressive
stemming
planetsjournalism
Analysis
Whitespace
Tokenizer
StemLowercase
Analysis Chain
EXERCISE
Exercise
Think about
1. Tokenization
2. Normalization
3. Stemming
Find 3 analysis-related problems.
Beyond the 70s
Machine-Learned
Relevance
Beyond the 70s
Machine-Learned Relevance
1. Supervised (Learning to Rank)
2. Unsupervised
Learning to Rank
TF-IDF ≠ relevance
Learning to Rank
“pluto”
0.8
Document
Model
Learning to Rank
“pluto”
Yes/No
Document
Learning to Rank
Learning to Rank
Problems
1. Require lots of data
Learning to Rank
Problems
1. Require lots of data
2. Difficult to train
3. Difficult to administer/serve
Learning to Rank
Problems
1. Require lots of data
2. Difficult to train
3. Difficult to administer/serve
4. Bias
SKIN WINS
Unsupervised
Unsupervised
How do you know it’s any good?
Search is UX
Beyond the 70s
Search is UX
Search ≠ Ranking
Search is UX
Search is UX
Tools
1. Facets
Search is UX
Tools
1. Facets
2. Results
Search is UX
Tools
1. Facets
2. Results
3. Autosuggest
Search is UX
Tools
1. Facets
2. Results
3. Autosuggest
4. Suggestions
Search is UX
Reasons to focus on UX
1. Low-Intent Traffic
2. Leverage
3. Mobile
4. Lower-Stakes
EXERCISE
Exercise
Suggest 3 UX
enhancements.
Think about
1. Exploration
2. Disambiguation
3. Mobile
Tools
1. Facets
2. Results
3. A...
Query
Understanding
Beyond the 70s
Query Understanding
keywords ≠ ideas
f you wanted to make this one statement
as well.
Or another one.
Search results for “dress shirt”
Query Understanding
keywords ideas
Query Understanding
“dress shirt”
Category:
Clothing > Shirts > Dress Shirts
Query Understanding
Tokenization
Stemming
Synonyms
Language Detection
Spelling Correction
Entity Recognition
Query Expansi...
Query Understanding
Too many tools.
Too many theoretical
problems.
Query Understanding
Start with data:
1. What are people looking
for?
2. When are they not
successful?
Query Understanding
Search Exit
f you wanted to make this one statement
as well.
Or another one.
Search results for “dress”
Query Understanding
“fabric” Supply!Model
Query Understanding
Query Understanding
Query Understanding
“fabric” Supply!You
Search results for “fanny pack”
Search results for “bum bag”
Users and documents
speak different languages.
Hat?
Query Understanding
User Data Metadata
Query Understanding
Search results for “red nascar”
EXERCISE
Exercise
Optimize some important
or underperforming
queries.
Problems
1. Precision
2. Recall
3. Exploration
Tools
1. Analy...
Search as a System
gio@relatedworks.io
www.relatedworks.io
Inside Search
Inside Search
Upcoming SlideShare
Loading in …5
×

Inside Search

38 views

Published on

This is a seminar/course I gave at the Metropolitan Library Council aimed at introducing layperson librarians to information retrieval.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Inside Search

  1. 1. Inside Search Metro, NYC November 15, 2017
  2. 2. Hello
  3. 3. Hello
  4. 4. Hello
  5. 5. Hello
  6. 6. Hello
  7. 7. EXERCISE
  8. 8. Exercise Tell me about you. 1. What’s your position? 2. What kind of organization are you from? 3. Why are you interested in search?
  9. 9. Agenda
  10. 10. don’t worry
  11. 11. 1. IR Basics 2. Search in the Modern Era Agenda
  12. 12. Information Retrieval 101
  13. 13. InformationUser IR 101
  14. 14. InformationUser IR 101 “pluto” Documents
  15. 15. IR 101 pluto?
  16. 16. Inverted Index
  17. 17. Inverted Index 1 2 3 “pluto and goofy...” “pluto the dwarf planet...” “stranger things...”
  18. 18. “pluto and goofy” pluto and goofy Inverted Index
  19. 19. Inverted Index 1pluto 2 planet 2 stranger 3
  20. 20. Inverted Index 1pluto 2 planet 2 stranger 3
  21. 21. Inverted Index 1 2
  22. 22. Ranking Term Frequency Inverse Document Frequency Term Frequency Document Frequency
  23. 23. 1 2 “pluto and goofy...” “pluto the dwarf planet...pluto” Ranking
  24. 24. 1 2 Term Frequency = 1 Term Frequency = 2 Document Frequency = 2 Ranking
  25. 25. 1 2 TFIDF = 1/2 = 0.5 TFIDF = 2/2 = 1 Ranking
  26. 26. 1 2 “pluto and goofy...”, TFIDF = 0.5 “pluto the dwarf planet...pluto”, TFIDF = 1 Ranking
  27. 27. IR 101 Index Time Query Time Forward Index Tokenize Inverted Index Matching Ranking
  28. 28. Multiple Terms IR 101
  29. 29. Multiple Terms “pluto planet”
  30. 30. “pluto planet” pluto planet Multiple Terms
  31. 31. Multiple Terms 1pluto 2 planet 2 stranger 3
  32. 32. Multiple Terms OR AND
  33. 33. Multiple Terms 1pluto 2 planet 2 1 2
  34. 34. Multiple Terms Term Frequency Document Frequency∑
  35. 35. Multiple Terms score = tf-idfpluto + tf- idfplanet
  36. 36. 1 2 “pluto and goofy...” “pluto the dwarf planet...” Multiple Terms
  37. 37. 1 2 tf-idfpluto = ½ tf-idfplanet = 0/2 tf-idfpluto = ½ tf-idfplanet = 1/1 Multiple Terms Document Frequencypluto = 2 Document Frequencyplanet = 1
  38. 38. 1 2 ½ + 0 = 0.5 ½ + 1 = 1.5 Multiple Terms
  39. 39. 1 2 “pluto and goofy...”, TFIDF = 0.5 “pluto the dwarf planet...”, TFIDF = 1.5 Multiple Terms
  40. 40. Measurement IR 101
  41. 41. Measurement Precision Recall How good are the results? How many of the good documents are in the results?
  42. 42. Measurement Precision Recall # Good Results # Results # Good Results # Good Documents
  43. 43. Measurement What’s good?
  44. 44. 2 1 “pluto and goofy…” “pluto the dwarf planet...” Measurement “pluto planet”
  45. 45. 5 1 “Pluto lost its status…” “pluto the dwarf planet...” Measurement “pluto”
  46. 46. Measurement Precision Recall
  47. 47. EXERCISE
  48. 48. Exercise Find a precision and a recall problem.
  49. 49. Analysis IR 101
  50. 50. 5 1 “Pluto lost its status…” “pluto the dwarf planet...” Analysis “pluto”
  51. 51. “Pluto lost its status...” Pluto lost its status Analysis
  52. 52. Pluto lost its status Analysis pluto lost its status
  53. 53. Analysis Analysis 1. Tokenize 2. Transform each Token
  54. 54. Analysis What happens if I search for Pluto? 1pluto 2
  55. 55. Analysis Index Time Query Time Forward Index TokenizeAnalysis Inverted Index TokenizeAnalysis Matching Ranking
  56. 56. Analysis What happens if I search for planets? 1planet 2
  57. 57. Analysis planetsplanets
  58. 58. Analysis precisionrecall stemming lemmatization aggressive stemming planetsjournalism
  59. 59. Analysis Whitespace Tokenizer StemLowercase Analysis Chain
  60. 60. EXERCISE
  61. 61. Exercise Think about 1. Tokenization 2. Normalization 3. Stemming Find 3 analysis-related problems.
  62. 62. Beyond the 70s
  63. 63. Machine-Learned Relevance Beyond the 70s
  64. 64. Machine-Learned Relevance 1. Supervised (Learning to Rank) 2. Unsupervised
  65. 65. Learning to Rank TF-IDF ≠ relevance
  66. 66. Learning to Rank “pluto” 0.8 Document Model
  67. 67. Learning to Rank “pluto” Yes/No Document
  68. 68. Learning to Rank
  69. 69. Learning to Rank Problems 1. Require lots of data
  70. 70. Learning to Rank Problems 1. Require lots of data 2. Difficult to train 3. Difficult to administer/serve
  71. 71. Learning to Rank Problems 1. Require lots of data 2. Difficult to train 3. Difficult to administer/serve 4. Bias
  72. 72. SKIN WINS
  73. 73. Unsupervised
  74. 74. Unsupervised How do you know it’s any good?
  75. 75. Search is UX Beyond the 70s
  76. 76. Search is UX Search ≠ Ranking
  77. 77. Search is UX
  78. 78. Search is UX Tools 1. Facets
  79. 79. Search is UX Tools 1. Facets 2. Results
  80. 80. Search is UX Tools 1. Facets 2. Results 3. Autosuggest
  81. 81. Search is UX Tools 1. Facets 2. Results 3. Autosuggest 4. Suggestions
  82. 82. Search is UX Reasons to focus on UX 1. Low-Intent Traffic 2. Leverage 3. Mobile 4. Lower-Stakes
  83. 83. EXERCISE
  84. 84. Exercise Suggest 3 UX enhancements. Think about 1. Exploration 2. Disambiguation 3. Mobile Tools 1. Facets 2. Results 3. Autosuggest 4. Suggestions
  85. 85. Query Understanding Beyond the 70s
  86. 86. Query Understanding keywords ≠ ideas
  87. 87. f you wanted to make this one statement as well. Or another one. Search results for “dress shirt”
  88. 88. Query Understanding keywords ideas
  89. 89. Query Understanding “dress shirt” Category: Clothing > Shirts > Dress Shirts
  90. 90. Query Understanding Tokenization Stemming Synonyms Language Detection Spelling Correction Entity Recognition Query Expansion Query Relaxation Query Classification Query Parsing Query Segmentation Knowledge Graphs
  91. 91. Query Understanding Too many tools. Too many theoretical problems.
  92. 92. Query Understanding Start with data: 1. What are people looking for? 2. When are they not successful?
  93. 93. Query Understanding Search Exit
  94. 94. f you wanted to make this one statement as well. Or another one. Search results for “dress”
  95. 95. Query Understanding “fabric” Supply!Model
  96. 96. Query Understanding
  97. 97. Query Understanding
  98. 98. Query Understanding “fabric” Supply!You
  99. 99. Search results for “fanny pack”
  100. 100. Search results for “bum bag”
  101. 101. Users and documents speak different languages.
  102. 102. Hat?
  103. 103. Query Understanding User Data Metadata
  104. 104. Query Understanding
  105. 105. Search results for “red nascar”
  106. 106. EXERCISE
  107. 107. Exercise Optimize some important or underperforming queries. Problems 1. Precision 2. Recall 3. Exploration Tools 1. Analysis 2. Understanding 3. Metadata 4. UX
  108. 108. Search as a System
  109. 109. gio@relatedworks.io www.relatedworks.io

×