Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Inside Search

32 views

Published on

This is a seminar/course I gave at the Metropolitan Library Council aimed at introducing layperson librarians to information retrieval.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Inside Search

  1. 1. Inside Search Metro, NYC November 15, 2017
  2. 2. Hello
  3. 3. Hello
  4. 4. Hello
  5. 5. Hello
  6. 6. Hello
  7. 7. EXERCISE
  8. 8. Exercise Tell me about you. 1. What’s your position? 2. What kind of organization are you from? 3. Why are you interested in search?
  9. 9. Agenda
  10. 10. don’t worry
  11. 11. 1. IR Basics 2. Search in the Modern Era Agenda
  12. 12. Information Retrieval 101
  13. 13. InformationUser IR 101
  14. 14. InformationUser IR 101 “pluto” Documents
  15. 15. IR 101 pluto?
  16. 16. Inverted Index
  17. 17. Inverted Index 1 2 3 “pluto and goofy...” “pluto the dwarf planet...” “stranger things...”
  18. 18. “pluto and goofy” pluto and goofy Inverted Index
  19. 19. Inverted Index 1pluto 2 planet 2 stranger 3
  20. 20. Inverted Index 1pluto 2 planet 2 stranger 3
  21. 21. Inverted Index 1 2
  22. 22. Ranking Term Frequency Inverse Document Frequency Term Frequency Document Frequency
  23. 23. 1 2 “pluto and goofy...” “pluto the dwarf planet...pluto” Ranking
  24. 24. 1 2 Term Frequency = 1 Term Frequency = 2 Document Frequency = 2 Ranking
  25. 25. 1 2 TFIDF = 1/2 = 0.5 TFIDF = 2/2 = 1 Ranking
  26. 26. 1 2 “pluto and goofy...”, TFIDF = 0.5 “pluto the dwarf planet...pluto”, TFIDF = 1 Ranking
  27. 27. IR 101 Index Time Query Time Forward Index Tokenize Inverted Index Matching Ranking
  28. 28. Multiple Terms IR 101
  29. 29. Multiple Terms “pluto planet”
  30. 30. “pluto planet” pluto planet Multiple Terms
  31. 31. Multiple Terms 1pluto 2 planet 2 stranger 3
  32. 32. Multiple Terms OR AND
  33. 33. Multiple Terms 1pluto 2 planet 2 1 2
  34. 34. Multiple Terms Term Frequency Document Frequency∑
  35. 35. Multiple Terms score = tf-idfpluto + tf- idfplanet
  36. 36. 1 2 “pluto and goofy...” “pluto the dwarf planet...” Multiple Terms
  37. 37. 1 2 tf-idfpluto = ½ tf-idfplanet = 0/2 tf-idfpluto = ½ tf-idfplanet = 1/1 Multiple Terms Document Frequencypluto = 2 Document Frequencyplanet = 1
  38. 38. 1 2 ½ + 0 = 0.5 ½ + 1 = 1.5 Multiple Terms
  39. 39. 1 2 “pluto and goofy...”, TFIDF = 0.5 “pluto the dwarf planet...”, TFIDF = 1.5 Multiple Terms
  40. 40. Measurement IR 101
  41. 41. Measurement Precision Recall How good are the results? How many of the good documents are in the results?
  42. 42. Measurement Precision Recall # Good Results # Results # Good Results # Good Documents
  43. 43. Measurement What’s good?
  44. 44. 2 1 “pluto and goofy…” “pluto the dwarf planet...” Measurement “pluto planet”
  45. 45. 5 1 “Pluto lost its status…” “pluto the dwarf planet...” Measurement “pluto”
  46. 46. Measurement Precision Recall
  47. 47. EXERCISE
  48. 48. Exercise Find a precision and a recall problem.
  49. 49. Analysis IR 101
  50. 50. 5 1 “Pluto lost its status…” “pluto the dwarf planet...” Analysis “pluto”
  51. 51. “Pluto lost its status...” Pluto lost its status Analysis
  52. 52. Pluto lost its status Analysis pluto lost its status
  53. 53. Analysis Analysis 1. Tokenize 2. Transform each Token
  54. 54. Analysis What happens if I search for Pluto? 1pluto 2
  55. 55. Analysis Index Time Query Time Forward Index TokenizeAnalysis Inverted Index TokenizeAnalysis Matching Ranking
  56. 56. Analysis What happens if I search for planets? 1planet 2
  57. 57. Analysis planetsplanets
  58. 58. Analysis precisionrecall stemming lemmatization aggressive stemming planetsjournalism
  59. 59. Analysis Whitespace Tokenizer StemLowercase Analysis Chain
  60. 60. EXERCISE
  61. 61. Exercise Think about 1. Tokenization 2. Normalization 3. Stemming Find 3 analysis-related problems.
  62. 62. Beyond the 70s
  63. 63. Machine-Learned Relevance Beyond the 70s
  64. 64. Machine-Learned Relevance 1. Supervised (Learning to Rank) 2. Unsupervised
  65. 65. Learning to Rank TF-IDF ≠ relevance
  66. 66. Learning to Rank “pluto” 0.8 Document Model
  67. 67. Learning to Rank “pluto” Yes/No Document
  68. 68. Learning to Rank
  69. 69. Learning to Rank Problems 1. Require lots of data
  70. 70. Learning to Rank Problems 1. Require lots of data 2. Difficult to train 3. Difficult to administer/serve
  71. 71. Learning to Rank Problems 1. Require lots of data 2. Difficult to train 3. Difficult to administer/serve 4. Bias
  72. 72. SKIN WINS
  73. 73. Unsupervised
  74. 74. Unsupervised How do you know it’s any good?
  75. 75. Search is UX Beyond the 70s
  76. 76. Search is UX Search ≠ Ranking
  77. 77. Search is UX
  78. 78. Search is UX Tools 1. Facets
  79. 79. Search is UX Tools 1. Facets 2. Results
  80. 80. Search is UX Tools 1. Facets 2. Results 3. Autosuggest
  81. 81. Search is UX Tools 1. Facets 2. Results 3. Autosuggest 4. Suggestions
  82. 82. Search is UX Reasons to focus on UX 1. Low-Intent Traffic 2. Leverage 3. Mobile 4. Lower-Stakes
  83. 83. EXERCISE
  84. 84. Exercise Suggest 3 UX enhancements. Think about 1. Exploration 2. Disambiguation 3. Mobile Tools 1. Facets 2. Results 3. Autosuggest 4. Suggestions
  85. 85. Query Understanding Beyond the 70s
  86. 86. Query Understanding keywords ≠ ideas
  87. 87. f you wanted to make this one statement as well. Or another one. Search results for “dress shirt”
  88. 88. Query Understanding keywords ideas
  89. 89. Query Understanding “dress shirt” Category: Clothing > Shirts > Dress Shirts
  90. 90. Query Understanding Tokenization Stemming Synonyms Language Detection Spelling Correction Entity Recognition Query Expansion Query Relaxation Query Classification Query Parsing Query Segmentation Knowledge Graphs
  91. 91. Query Understanding Too many tools. Too many theoretical problems.
  92. 92. Query Understanding Start with data: 1. What are people looking for? 2. When are they not successful?
  93. 93. Query Understanding Search Exit
  94. 94. f you wanted to make this one statement as well. Or another one. Search results for “dress”
  95. 95. Query Understanding “fabric” Supply!Model
  96. 96. Query Understanding
  97. 97. Query Understanding
  98. 98. Query Understanding “fabric” Supply!You
  99. 99. Search results for “fanny pack”
  100. 100. Search results for “bum bag”
  101. 101. Users and documents speak different languages.
  102. 102. Hat?
  103. 103. Query Understanding User Data Metadata
  104. 104. Query Understanding
  105. 105. Search results for “red nascar”
  106. 106. EXERCISE
  107. 107. Exercise Optimize some important or underperforming queries. Problems 1. Precision 2. Recall 3. Exploration Tools 1. Analysis 2. Understanding 3. Metadata 4. UX
  108. 108. Search as a System
  109. 109. gio@relatedworks.io www.relatedworks.io

×