Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison

32 views

Published on

Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison

  1. 1. © 2019 The MITRE Corporation. All rights reserved. Quaerite – Search Relevance Toolkit Tim Allison tallison@apache.org, @_tallison April 24, 2019 Haystack Conference Approved for Public Release; Distribution Unlimited. Case Number 18-3138-5
  2. 2. | 2 | © 2019 The MITRE Corporation. All rights reserved. Debt of Gratitude ▪ Thank you Doug Turnbull, John Berryman and Open Source Connections for the inspiration/examples/training with tmdb and for sharing your ground truth set!
  3. 3. | 3 | © 2019 The MITRE Corporation. All rights reserved. Yet Another Toolkit? Why!? ▪ How many parameters do we have? ▪ How many permutations of those parameters are available?
  4. 4. | 4 | © 2019 The MITRE Corporation. All rights reserved. Available Parameters ▪ 14 tokenizers https://lucene.apache.org/solr/guide/7_1/tokenizers.html ▪ ~45 token filters (not including language-specific token filters – see next slide) https://lucene.apache.org/solr/guide/7_1/filter-descriptions.html ▪ Query parsers ▪ Query operators, minimum should match, should, must, not ▪ Token/field based scoring – best_fields, most_fields, cross_fields ▪ Field boosting ▪ Phrasal boosting/shingling ▪ Synonym lists, taxonomies ▪ Similarity scoring parameters (with BM25) ▪ Elevate ▪ External signal enrichment – manual or automatic (NLP – entity extraction, categorization, etc.) ▪ Reranking via machine learning (Learning to Rank) | 4 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
  5. 5. | 5 | © 2019 The MITRE Corporation. All rights reserved. Each Token Filter Can Have Many Parameters <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1"/> | 5 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
  6. 6. | 6 | © 2019 The MITRE Corporation. All rights reserved. Overview – Offline testing toolkit Prerequisites: 1. Reliable, generalizable ground truth 2. Reliable, useful underlying data 3. Offline metric has to have some connection to KPIs 4. Expertise – you still have to know what you’re doing!!!
  7. 7. | 7 | © 2019 The MITRE Corporation. All rights reserved. Main Tools 1. Run Experiments 2. Generate Experiments ▪ All permutations (grid search) ▪ Random experiments (random search) 3. Genetic Algorithm ▪ Cross-fold validation!!! ▪ Complementary to LTR -- main diff is algorithm and in running offline to tune general settings rather than as reranking top n
  8. 8. | 8 | © 2019 The MITRE Corporation. All rights reserved. Odds and Ends ▪ Analyzer Comparison over (mostly) the index ▪ Significant Terms (yawn…for archaic versions of Solr)…and planning to add these as parameters in “generate experiments”
  9. 9. | 9 | © 2019 The MITRE Corporation. All rights reserved. Adding Porter Stemming: create account creat created: 709 create: 551 creating: 269 creates: 153 creat: 1 account account: 3244 accounts: 1924 accounting: 1548 accountants: 340 accountant: 176 accounted: 134 accountability: 74 accountable: 74 accountancy: 65 account's: 7 accountant's: 7
  10. 10. | 10 | © 2019 The MITRE Corporation. All rights reserved. Status ▪ Alpha release 3/22/2019 (Solr only) ▪ Beta1 release this week (?) – This will include support for ElasticSearch ▪ Dream – Incorporate experiment generation/GA into Rated Ranking Evaluator (RRE) – Apache Incubator -> Top Level Project (TLP)
  11. 11. | 11 | © 2019 The MITRE Corporation. All rights reserved. Links ▪ Main site: https://github.com/mitre/quaerite ▪ Examples: https://github.com/mitre/quaerite/blob/master/quaerite- examples/README.md ▪ Contact – tallison@apache.org – @_tallison

×