Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elastic Relevance Presentation feb4 2020

39 views

Published on

In this talk we’ll cover the basics of search relevancy in elasticsearch from how relevancy is calculated and modeled to modifying query structure, setting up analyzer chains and how to measure incremental improvements. The talk will highlight several real world relevancy scenarios encountered in the consulting work at KMW Technology, a leading provider of search professional services to major organizations.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Elastic Relevance Presentation feb4 2020

  1. 1. Relevance Tuning In Elasticsearch Rudi Seitz KMW Technology Elastic Boston User Group
  2. 2. Outline • Intro to Relevance • Crash Course: Scoring • Relevance Tuning Case Study • Testing Relevance • Discussion
  3. 3. What is Relevance? • A subjective measure of how useful a document is to user who searched for something • Does it satisfy the user’s information need? • If I search for “cats”… • Probably relevant: the movie “Cats,” the stage musical “Cats,” cat pictures, cat blogs, cat food, Felis catus • Vaguely relevant: dogs • Not relevant: CAT scanners, catsup, cement mixers
  4. 4. I’m relevant and I hate it!
  5. 5. What Is Relevance Tuning? Adjusting the content of search results so that the most relevant documents are included Adjusting the order of search results so that the most relevant results appear on top
  6. 6. #1 #?
  7. 7. Why Tune Relevance? • FANTASY: “Once I get the data into my search engine, it does all the work of finding the best matches for my queries.” • TRUTH: “We have to configure the search engine to rank results in a way that is meaningful to the user.”
  8. 8. Search Engine Doesn't Know… • Which fields are important • How users will search those fields • Which query terms are the most significant • Whether term order is significant • Which terms mean the same thing • What priorities the user has based on location, season, task, etc. • What priorities the provider has re: sales, promotions, sponsorships, etc. • Whether freshness, popularity, ratings are important
  9. 9. Relevance Problems • Search for “Rocky" returns “Rocky Road To Dublin” before the movie “Rocky” • Search in MA for "coffee" returns “Coffee Day” (chain in India) before “George Howell Coffee” • Search for product by SKU returns permutations • Search for “bikes” fails to find “bicycle” • Search for “The The” (band) returns no results
  10. 10. Precision and Recall • High Precision: "Everything I see is useful to me" • High Recall: “Everything I might want is included” • Relevance tuning is a tradeoff between precision and recall
  11. 11. Precision And Recall • Precision = Relevant Results / All Results • “Only 5 out of 10 results returned were useful to me. There was a lot of noise.” • Recall = Relevant Results / All Relevant Documents • “Only 5 out of 10 useful documents in the index were returned. There were lots of things missing.”
  12. 12. When relevance you want to tune, All iffy results you should prune To achieve good precision, Unless your decision’s That recall is more of a boon. Precision and Recall
  13. 13. How do we tune relevance? • Enrich documents with metadata that's useful to search • Search the right fields • Configure field analyzers to match the way users search • Set field weights • Match phrases • Handle typos • Apply synonyms and stemming • Reward exact/complete matches • Reward freshness, popularity, ratings, etc.
  14. 14. Scoring • A search engine has to find relevant documents without knowing what they mean • A search engine assigns a numerical score to each match using a "blind" but effective statistical heuristic • Results are displayed in order by score • To tune relevance we need to understand the search engine’s built-in method of scoring
  15. 15. Query: Corpus: Indus Script, 3500 B.C.
  16. 16. All You Need To Know About Scoring: Six Examples
  17. 17. Query 1: “dog” Doc 1: “dog” Doc 2: “dog dog” Doc 3: “dog dog dog”
  18. 18. Query 1: “dog” Doc 1: “dog” Doc 2: “dog dog” Doc 3: “dog dog dog” GET test/_search { "query": { "match": { "title": "dog" } } } "mappings": { "properties": { "title": { "type": "text", "analyzer": "standard" } } }
  19. 19. Query 1: “dog” Doc 1: “dog” → 0.167 Doc 2: “dog dog” → 0.183 Doc 3: “dog dog dog” → 0.189 High Term Frequency is good Term Frequency (TF)
  20. 20. Query 2: “dog dog cat” Doc 1: “cat” Doc 2: “dog”
  21. 21. Query 2: “dog dog cat” Doc 1: “cat” → 0.6 Doc 2: “dog” → 1.3 Scores for each term are summed
  22. 22. Query 3: “dog dog cat” Doc 1: “dog” Doc 2: “dog” Doc 3: “dog” Doc 4: “dog” Doc 5: “dog” Doc 6: “dog” Doc 7: “cat”
  23. 23. Query 3: “dog dog cat” Doc 1: “dog” → 0.4 Doc 2: “dog” → 0.4 Doc 3: “dog” → 0.4 Doc 4: “dog” → 0.4 Doc 5: “dog” → 0.4 Doc 6: “dog” → 0.4 Doc 7: “cat” → 1.5 Matches for rarer terms are better Document Frequency (DF)
  24. 24. Query 3.5: “dog^7 cat” Doc 1: “dog” → 1.6 Doc 2: “dog” → 1.6 Doc 3: “dog” → 1.6 Doc 4: “dog” → 1.6 Doc 5: “dog” → 1.6 Doc 6: “dog” → 1.6 Doc 7: “cat” → 1.5 We can boost terms GET test/_search { "query": { "query_string": { "query": "dog^7 cat", "fields": ["title"] } } }
  25. 25. Query 4: “dog cat” Doc 1: “dog dog dog dog dog dog dog” Doc 2: “cat cat cat cat cat cat cat” Doc 3: “dog cat”
  26. 26. Query 4: “dog cat” Doc 1: “dog dog dog dog dog dog dog” → 0.8 Doc 2: “cat cat cat cat cat cat cat” → 0.8 Doc 3: “dog cat” → 1.2 Matching more query terms is good Term Saturation
  27. 27. Query 5: “dog” Doc 1: “dog cat zebra” Doc 2: “dog cat”
  28. 28. Query 5: “dog” Doc 1: “dog cat zebra” → 0.16 Doc 2: “dog cat” → 0.19 Matches in shorter fields are better
  29. 29. Query 6: “orange dog” Doc 1: { "type" : "dog", "color" : "brown" } Doc 2: { "type" : "dog", "color" : "brown" } Doc 3: { "type" : "cat", "color" : "brown" } Doc 4: { "type" : "cat", "color" : "orange" } GET test/_search { "query": { "multi_match": { "query": "orange dog", "fields": ["type", "color"], "type": "most_fields" } } } brown dog brown dog brown cat orange cat
  30. 30. Query 6: “orange dog” Doc 1: { "type" : "dog", "color" : "brown" } → 0.6 Doc 2: { "type" : "dog", "color" : "brown" } → 0.6 Doc 3: { "type" : "cat", "color" : "brown" } Doc 4: { "type" : "cat", "color" : "orange" } → 1.2
  31. 31. Query 6: “orange dog” Doc 1: { "type" : "dog", "color" : "brown" } → 1.3 Doc 2: { "type" : "dog", "color" : "brown" } → 1.3 Doc 3: { "type" : "cat", "color" : "brown" } Doc 4: { "type" : "cat", "color" : "orange" } → 1.2 We can boost fields GET test/_search { "query": { "multi_match": { "query": "orange dog", "fields": [“type^2", "color"], "type": "most_fields" } } }
  32. 32. Ties “when two documents have the same score, they will be sorted by their internal Lucene doc id (which is unrelated to the _id) by default” “The internal doc_id can differ for the same document inside each replica of the same shard so it's recommended to use another tiebreaker for sort in order to get consistent results. For instance you could do: sort: ["_score", "datetime"] to force top_hits to rank documents based on score first and use datetime as a tiebreaker.” "sort": [ { "_score": { "order": "desc" }}, { "date": { "order": "desc" }} ]
  33. 33. Comparing Field Scores • Raw scores across fields are not directly comparable • Term frequencies, document frequencies, and average field length all differ across fields • Field analyzers can generate additional tokens that that affect scoring • A "good" match in one field might score in the range 0.1 to 0.2 while a good match in another field might score in the range 1 to 2. There’s no universal relevance scale. • A multiplicative boost of 10 doesn't mean “field1 is 10 times more important than field2” • Boosts can compensate for scale discrepancies
  34. 34. TF x IDF A search engine handles the chore Of ranking each match, good or poor: If a document’s TF Divided by DF Is huge, it will get the top score.
  35. 35. How does TFxIDF affect query scoring? • High score: A document with many occurrences of a rare term • Low score: A document with few occurrences of a common term • TFxIDF depends on the corpus • A term stops being rare once more documents are added that contain it • Documents that don't match a query can still affect the order of results
  36. 36. Practical Scoring Function
  37. 37. BM25
  38. 38. TF Saturation
  39. 39. Explain _score: 0.18952842 _source: title: "dog dog dog" _explanation: value: 0.18952842 description: "weight(title:dog in 0) [PerFieldSimilarity], result of:" details: - value: 0.18952842 description: "score(freq=3.0), product of:" details: - value: 2.2 description: "boost" details: [] - value: 0.13353139 description: "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:" details: - value: 3 description: "n, number of documents containing term" details: [] - value: 3 description: "N, total number of documents with field" details: [] - value: 0.6451613 description: "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:" details: - value: 3.0 description: "freq, occurrences of term within document" details: [] - value: 1.2 description: "k1, term saturation parameter" details: [] - value: 0.75 description: "b, length normalization parameter" details: [] - value: 3.0 description: "dl, length of field" details: [] - value: 2.0 description: "avgdl, average length of field" details: [] GET test/_search?format=yaml { "explain": true, "query": { "match": { "title": "dog" } } }
  40. 40. Explain _score: 0.18952842 _source: title: "dog dog dog" _explanation: 0.18952842 = "weight(title:dog in 0) [PerFieldSimilarity], result of:" 0.18952842 ="score(freq=3.0), product of:" 2.2 = "boost" 0.13353139 = "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:" 3 = "n, number of documents containing term" 3 = "N, total number of documents with field" 0.6451613 = "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl))"" 3.0 = "freq, occurrences of term within document" 1.2 = "k1, term saturation parameter" 0.75 = "b, length normalization parameter" 3.0 = "dl, length of field" 2.0 = "avgdl, average length of field"
  41. 41. The Mysterious Boost PUT /bm25test { "settings": { "index": { "similarity": { "my_similarity": { "type": "BM25", "k1": 1.3, "b": 0.75 }}}}, "mappings": { "properties": { "title": { "type": "text", "similarity": "my_similarity" }}}} PUT /bm25test/_doc/1 { "title" : "dog" } GET /bm25test/_search { "query": { "match": { "title": "dog" } }, "explain": true } "_explanation" : { "value" : 0.2876821, "description" : "weight(title:dog in 0) …", "details" : [ { "value" : 0.2876821, "description" : "score(freq=1.0), product of:", "details" : [ { "value" : 2.3, "description" : "boost", "details" : [ ] },
  42. 42. Scoring: Extra Credit
  43. 43. Query 7: “dog” Doc 1: “dog” Doc 2: “dog” Doc 3: “dog” GET test/_search { "query": { "match": { "title": "dog" } } }
  44. 44. Query 7: “dog” Doc 1: “dog” → 0.28 Doc 2: “dog” → 0.18 Doc 3: “dog” → 0.18
  45. 45. Query 7: “dog” Doc 1: “dog” → 0.28 Doc 2: “dog” → 0.18 Doc 3: “dog” → 0.18 Statistics are per-shard PUT /test { "settings": { "number_of_shards": 2 } } PUT /test/_doc/1?routing=0 { "title" : "dog" } PUT /test/_doc/2?routing=1 { "title" : "dog" } PUT /test/_doc/3?routing=1 { "title" : "dog" }
  46. 46. Query 7: “dog” Doc 1: “dog” → 0.13 Doc 2: “dog” → 0.13 Doc 3: “dog” → 0.13 We can do a Distributed Frequency Search GET /test/_search?search_type=dfs_query_then_fetch { "query": { "match": { "title": "dog" } } }
  47. 47. Replicas And Scoring • Replicas of the same shard may have different statistics • Documents marked for deletion but not yet physically removed (when their segments are merged) still contribute to statistics • Replicas may be out of sync re: physical deletion • Specifying a user or session ID in the shard copy preference parameter helps route requests to the same replicas
  48. 48. Updates and Scoring • Updates to an existing document behave like adding a completely new document as far as DF statistics, until segments are merged: • “n, number of documents containing term” increases • “N, total number of documents with field” increases
  49. 49. Updates and Scoring PUT test/_doc/1 { "title": "dog cat" } GET test/_search?format=yaml { "query" : { "match" : { "title": "dog" } }, "explain": true } PUT test/_doc/1?refresh { "title": "dog zebra" } GET test/_search?format=yaml { "query" : { "match" : { "title": "dog" } }, "explain": true } POST test/_forcemerge GET test/_search?format=yaml { "query" : { "match" : { "title": "dog" } }, “explain": true } _score: 0.2876821 "n, number of documents containing term”: 1 "N, total number of documents with field”: 1 _score: 0.2876821 "n, number of documents containing term”: 1 "N, total number of documents with field”: 1 _score: 0.18232156 "n, number of documents containing term”: 2 "N, total number of documents with field”: 2
  50. 50. Query 4: “dog cat” Doc 1: “dog dog dog dog dog dog dog” → 0.8 Doc 2: “cat cat cat cat cat cat cat” → 0.8 Doc 3: “dog cat” → 1.2 Matching more query terms is good. But what also benefits Doc 3 here?
  51. 51. Query 4 redux: “dog cat” Doc 1: “dog dog” → 0.6 Doc 2: “cat cat” → 0.6 Doc 3: “dog cat” → 0.9 Matching more query terms is good
  52. 52. Query 4 redux: “dog cat” Doc 1: {“pet1”: “dog”, “pet2”: “dog”} Doc 2: {“pet1”: “dog”, “pet2”: “cat”} GET test/_search { "query": { "multi_match": { "query": "dog cat", "fields": [“pet1”, "pet2"], "type": "most_fields" } } }
  53. 53. Query 4 redux: “dog cat” Doc 1: {“pet1”: “dog”, “pet2”: “dog”} → 0.87 Doc 2: {“pet1”: ”dog”, “pet2”: “cat”} → 0.87 Matching more query terms within the same field is good. But there's no advantage when the matches happen across fields.
  54. 54. Query 4 redux: “dog cat” Doc 1: {“pet1”:“dog”, “pet2”: “dog”} → 0.18 Doc 2: {“pet1”:”dog”, “pet2”: “cat”} → 0.87 We can simulate a single field using cross_fields. GET test/_search { "query": { "multi_match": { "query": "dog cat", "fields": [“pet1”, "pet2"], "type": "cross_fields" } } }
  55. 55. Query 8: “orange dog” Doc 1: {“type”: “dog”, “description”: “A sweet and loving pet that is always eager to play. Brown coat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non nibh sagittis, mollis ex a, scelerisque nisl. Ut vitae pellentesque magna, ut tristique nisi. Maecenas ut urna a elit posuere scelerisque. Suspendisse vel urna turpis. Mauris viverra fermentum ullamcorper. Duis ac lacus nibh. Nulla auctor lacus in purus vulputate, maximus ultricies augue scelerisque.”} Doc 2: {“type”: “cat”, “description”: “Puzzlingly grumpy. Occasionally turns orange.”} GET test/_search { "query": { "multi_match": { "query": "orange dog", "fields": ["type", “description"], "type": "most_fields" } } }
  56. 56. Query 8: “orange dog” Doc 1: {“type”: “dog”, “description”: “A sweet and loving pet that is always eager to play. Brown coat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non nibh sagittis, mollis ex a, scelerisque nisl. Ut vitae pellentesque magna, ut tristique nisi. Maecenas ut urna a elit posuere scelerisque. Suspendisse vel urna turpis. Mauris viverra fermentum ullamcorper. Duis ac lacus nibh. Nulla auctor lacus in purus vulputate, maximus ultricies augue scelerisque.”} → 1.06 Doc 2: {“type”: “cat”, “description”: “Puzzlingly grumpy. Occasionally turns orange.”} → 0.69 “Shortness” is relative to the field's average
  57. 57. Query 9: “abcd efghijklmnopqrstuvwxyz” Doc 1: “abcd” Doc 2: “efghijklmnopqrstuvwxyz”
  58. 58. Query 9: “abcd efghijklmnopqrstuvwxyz” Doc 1: “abcd” → 0.69 Doc 2: “efghijklmnopqrstuvwxyz” → 0.69 Term length is not significant.
  59. 59. Case Study: SKUs I searched for a product by SKU— I was looking to purchase a shoe— But the website I used Seemed very confused And offered me nothing to view. 123AB-543D-234C
  60. 60. Requirements 1. Exact match: 123AB-543D-234C 2. Without punctuation: 123AB 543D 234C 3. Without spaces: 123AB543D234C 4. Section: 123AB 5. Section prefix: 123A 6. Typo: 123AB543D234D 7. Replacement products 8. Tie-breakers: Popularity, Freshness
  61. 61. Step 1: Standard Analyzer PUT /test { "mappings": { "properties": { "sku": { "type": "text", "analyzer": "standard" } } } }
  62. 62. Query: “123AB-543D-234C” Doc 1: “123AB-543D-234C” winged Doc 2: “123AB-234C-543D” not winged
  63. 63. Query: “123AB-543D-234C” Doc 1: “123AB-543D-234C” → 0.54 Doc 2: “123AB-234C-543D” → 0.54 Term order is ignored
  64. 64. Debug With Analyze API GET skutest/_analyze?filter_path=*.token&format=yaml { "text": ["123AB-543D-234C"], "analyzer" : "standard" } --- tokens: - token: "123ab" - token: "543d" - token: "234c"
  65. 65. Analysis Chain winged not winged
  66. 66. Step 2: Shingles GET skutest/_analyze?filter_path=*.token&format=yaml { "text": ["123AB-543D-234C"], "tokenizer": "standard", "filter": ["lowercase", {"type":"shingle", "max_shingle_size":4}] } --- tokens: - token: "123ab" - token: "123ab 543d" - token: "123ab 543d 234c" - token: "543d" - token: "543d 234c" - token: "234c"
  67. 67. PUT test { "settings": { "analysis": { "filter": { "shingle4" : { "type" : "shingle", "max_shingle_size" : 4 } }, "analyzer": { "custom_sku": { "filter": ["lowercase", "shingle4"], "tokenizer" : "standard" } } } }, "mappings": { "properties": { "sku": { "type": "text", "analyzer": "custom_sku" } } } }
  68. 68. winged not winged
  69. 69. winged winged
  70. 70. Query: “123AB-543D-234C” Doc 1: “123AB-543D-234C” → 0.84 Doc 2: “123AB-234C-543D” → 0.68 Term order is respected
  71. 71. Query: “123AB-543D-234C” Doc 1: “123AB-543D-234C” → 0.96 Doc 2: [“123AB-234C-543D-234C-1”, “123AB-234C-543D-234C-2”, “123AB-234C-543D-234C-3” ] → 1.01 Exact match isn’t respected enough!
  72. 72. Step 3: Reward Exact MatchesPUT sku6 { "settings": { "analysis": { "filter": { "shingle4" : { "type" : "shingle", "max_shingle_size" : 4 } }, "analyzer": { "custom_sku": { "filter": ["lowercase", "shingle4"], "tokenizer" : "standard" }, "lowercase": { "filter": ["lowercase"], "tokenizer" : "keyword" } } } },
  73. 73. Multifields "mappings": { "properties": { "sku": { "type": "text", "fields": { "exact": { "type": "text", "analyzer": "lowercase" }, "shingle": { "type": "text", "analyzer": "custom_sku" } } } } } }
  74. 74. Query: “123AB-543D-234C” Doc 1: “123AB-543D-234C” → 1.84 Doc 2: [“123AB-234C-543D-234C-1”, “123AB-234C-543D-234C-2”, “123AB-234C-543D-234C-3” ] → 1.01 GET /test/_search { "query": { "multi_match": { "query": "123AB-543D-234C", "fields": ["sku.exact", "sku.shingle"], "type": "most_fields" } } }
  75. 75. Step 4: ngrams GET skutest/_analyze?filter_path=*.token&format=yaml { "text": ["123AB-543D-234C"], "tokenizer": "standard", "filter": ["lowercase", {"type":"edge_ngram", "min_gram": 3, "max_gram": 8}] } --- tokens: - token: "123" - token: "123a" - token: "123ab" - token: "543" - token: "543d" - token: "234" - token: "234c"
  76. 76. Step 5: Omitting Spaces GET skutest/_analyze?filter_path=*.token&format=yaml { "text": ["123AB-543D-234C"], "filter": [{"type":"word_delimiter", "catenate_all":"true", "generate_word_parts":"false", "generate_number_parts":"false", "preserve_original":"false", "split_on_numerics":"false", "split_on_case_change":"false" }], "tokenizer": "keyword" } --- tokens: - token: "123AB543D234C"
  77. 77. Step 6: Synonyms PUT test { "settings": { "analysis": { "filter": { "shingle4" : { "type" : "shingle", "max_shingle_size" : 4 }, "synonym" : { "type" : "synonym", "synonyms" : ["8971-34DA-65JQ => 123AB-543D-234C"] } }, "analyzer": { "custom_sku": { "filter": ["lowercase", "synonym", "shingle4"], "tokenizer" : "standard" }, "lowercase": { "filter": ["synonym", "lowercase"], "tokenizer" : "keyword" } } } },
  78. 78. Step 7: Typos / Fuzziness GET /sku8/_search { "query": { "bool": { "should": [ { "multi_match": { "query": "123AB543D234D", "fields": ["sku.exact", "sku.catenated", "sku.shingle"], "type": "most_fields", "boost": 5 } }, { "multi_match": { "query": "123AB543D234D", "fields": ["sku.exact", "sku.catenated"], "type": "most_fields", "fuzziness": 1 } }] } } }
  79. 79. Fuzziness and Scoring PUT /test/_doc/1 { "title": "dog" } PUT /test/_doc/2 { "title": "elephant" } GET /test/_validate/query?rewrite=true { "query": { "match" : { "title": { "query": "dog", "fuzziness": 2 }}}} GET /test/_search { "query": { "fuzzy" : { "title": { "value": "dog", "fuzziness": 2, "rewrite": "constant_score" } } } } Query Lucene Query Edits dog title:dog dg (title:dog)^0.5 D do (title:dog)^0.5 D dgo (title:dog)^0.666666 T dox (title:dog)^0.666666 S dogg (title:dog)^0.666666 I doggg (title:dog)^0.333333 I, I elepha (title:elephant)^0.6666666 D, D elephan (title:elephant)^0.85714287 D elephantt (title:elephant)^0.875 I elephanttt (title:elephant)^0.75 I, I
  80. 80. Query: “123AB-543D-234C” Doc 1: {“sku” : “123AB-543D-234D”, “likes”: 1000} → 0.6398282 Doc 2: {“sku” : “123AB-543D-234E”, “likes”: 5000} → 0.6398282
  81. 81. Step 8: Popularity GET /sku9/_search { "query": { "script_score": { "script": { "source": "_score + 0.00000001*doc['likes'].value" }, "query": { "bool" : { "should" : [ { "multi_match": { "query": "123AB543D234C", "fields": ["sku.exact", "sku.catenated", "sku.shingle"], "type": "most_fields", "boost" : 5 } }, { "multi_match" : { "query": "123AB543D234C", "fields": ["sku.exact", "sku.catenated"], "type": "most_fields", "fuzziness": 1 } }] } } } } }
  82. 82. Query: “123AB-543D-234C” Doc 1: {“sku” : “123AB-543D-234D”, “likes”: 1000} → 0.6398382 Doc 2: {“sku” : “123AB-543D-234E”, “likes”: 5000} → 0.6398782
  83. 83. Query: “123AB” Doc 1: {“sku” : “123AB-543D-234G”, “date”: “2018-01-01”} → 0.46582156 Doc 2: {“sku” : “123AB-543D-234H”, “date”: “2019-01-01”} → 0.46582156
  84. 84. Step 9: Freshness
  85. 85. Query: “123AB” Doc 1: {“sku” : “123AB-543D-234G”, “date”: “2018-01-01”} → 0.46585178 Doc 2: {“sku” : “123AB-543D-234H”, “date”: “2019-01-01”} → 0.46588513 "script_score": { "script": { "source": "_score + 0.0001*decayDateLinear('2020-02-04', '1095d', '0d', 0.0, doc['date'].value)" },
  86. 86. Case Study: SKUs I searched for a product by SKU— I was looking to purchase a shoe. Results were returned And the price I soon learned. “I’ll take one,” I said, “Make it two!” 123AB-543D-234C
  87. 87. Testing
  88. 88. Quepid: Before
  89. 89. Quepid: Scoring
  90. 90. Quepid: After
  91. 91. Quepid: Custom Scorers
  92. 92. Ranking Evaluation API
  93. 93. Stopword Filtering GET test/_analyze { "analyzer" : "english", "text" : "To be, or not to be, that is the..." } { "tokens" : [ ] }
  94. 94. “I am not a fan of stopword filtering.” — W. Shakespeare
  95. 95. Stemming GET certona/_analyze?filter_path=*.token&format=yaml { "analyzer": "english", "text": "dog dogs dog's dogged dogging doggy doggie doggies doggy's" } --- tokens: - token: "dog" - token: "dog" - token: "dog" - token: "dog" - token: "dog" - token: "doggi" - token: "doggi" - token: "doggi" - token: “doggi" dogged person = dog person?
  96. 96. Next Steps •Suggestions •Highlighting •Search as you type •Multi-language support •Session-Based Relevancy •Signals Boosting / Adaptive Relevancy •Learning To Rank (LTR) •Relevancy Profiles (User, Region, Time) •Named Entity Recognition •Query Classification
  97. 97. Discussion

×