SlideShare a Scribd company logo
Submit Search
Upload
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
Report
OpenSource Connections
Principal, OpenSource Connections and Solr Consultant at OpenSource Connections
Follow
•
0 likes
•
250 views
1
of
11
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
•
0 likes
•
250 views
Download Now
Download to read offline
Report
Data & Analytics
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
Read more
OpenSource Connections
Principal, OpenSource Connections and Solr Consultant at OpenSource Connections
Follow
Recommended
Haystack 2019 - Architectural considerations on search relevancy in the conte...
OpenSource Connections
136 views
•
13 slides
Documentation and Deployment through Python Libraries
Rishabh Garg
3K views
•
14 slides
Real time analytics with Power BI
HARIHARAN R
172 views
•
19 slides
Introduction to Power BI
HARIHARAN R
2.2K views
•
21 slides
Apply MLOps at Scale
Databricks
687 views
•
24 slides
Kashif Khurshid's Career Journey- Visual Guide
Kashif Khurshid
1.2K views
•
10 slides
More Related Content
What's hot
Scalable, Fast Analytics with Graph - Why and How
Cambridge Semantics
699 views
•
36 slides
Datahive 360 - Felipe Wesbonk
Immelda Oord
88 views
•
34 slides
Building A Feature Factory
Databricks
1K views
•
37 slides
The DataSift platform
ChrisParsons7
608 views
•
13 slides
Get best data scraper
ApiScrapy AIMLEAP
79 views
•
3 slides
Esri in AWS Cloud
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
5.7K views
•
10 slides
What's hot
(20)
Scalable, Fast Analytics with Graph - Why and How
Cambridge Semantics
•
699 views
Datahive 360 - Felipe Wesbonk
Immelda Oord
•
88 views
Building A Feature Factory
Databricks
•
1K views
The DataSift platform
ChrisParsons7
•
608 views
Get best data scraper
ApiScrapy AIMLEAP
•
79 views
Esri in AWS Cloud
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
•
5.7K views
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
•
258 views
Turning Machine Learning Prototypes into Products
All Things Open
•
147 views
MLSD18. Automating Machine Learning Workflows
BigML, Inc
•
477 views
SharePoint Search Results Branding
Cory Peters
•
676 views
Powering Next Best Action
All Things Open
•
174 views
How a global manufacturing company built a data science capability from scratch
Carlo Torniai
•
2.1K views
Schema on read with runtime fields
Elasticsearch
•
5.7K views
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
•
2.5K views
Arquitectura de Datos en Azure
Elena Lopez
•
61 views
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
Databricks
•
553 views
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
apidays
•
76 views
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Databricks
•
5.4K views
Esri ArcGIS Federal
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
•
5.8K views
Esri WebGIS Platform
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
•
5K views
Similar to Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
Implementing Machine Learning Incrementally
Ravindra Guntur
31 views
•
19 slides
Hyperledger weatherreport20190219 公開版
Hyperleger Tokyo Meetup
423 views
•
27 slides
Robotic Process Auditing
Jim Kaplan CIA CFE
397 views
•
25 slides
Atlassian Executive Business Forum - LinkedIn HQ
ServiceRocket
677 views
•
37 slides
Keeping SharePoint Always On
AntonioMaio2
774 views
•
31 slides
Using Machine Learning to Debug complex Oracle RAC Issues
Anil Nair
1.9K views
•
53 slides
Similar to Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
(20)
Implementing Machine Learning Incrementally
Ravindra Guntur
•
31 views
Hyperledger weatherreport20190219 公開版
Hyperleger Tokyo Meetup
•
423 views
Robotic Process Auditing
Jim Kaplan CIA CFE
•
397 views
Atlassian Executive Business Forum - LinkedIn HQ
ServiceRocket
•
677 views
Keeping SharePoint Always On
AntonioMaio2
•
774 views
Using Machine Learning to Debug complex Oracle RAC Issues
Anil Nair
•
1.9K views
Extreme Automation: The Emergence of RPA and AI for Treasury
Kyriba Corporation
•
773 views
FLITE_Presentation JG v
Wesley Samples
•
96 views
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Amazon Web Services Korea
•
1.2K views
MITRE-Module 2 Slides.pdf
ReZa AdineH
•
37 views
Proofpoint Emerging Threats Suricata 5.0 Webinar
Jason Williams
•
1.6K views
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
•
3.3K views
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
OpenSource Connections
•
359 views
Driving TAS Enterprise Fitness
VMware Tanzu
•
318 views
BRE Deep Dive
BizTalk360
•
2K views
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Rohit Dhamija
•
305 views
Enabling Agility Through DevOps
Leland Newsom CSP-SM, SPC5, SDP
•
373 views
Motadata product itsm overview
Daya Cipta Mandiri Solusi, PT
•
188 views
RPA Portfolio Assessment
Eric Rodman
•
243 views
Current Developments in AgTech Law Licensing Executive Society
Roger Royse
•
157 views
More from OpenSource Connections
Encores
OpenSource Connections
2K views
•
53 slides
Test driven relevancy
OpenSource Connections
272 views
•
20 slides
How To Structure Your Search Team for Success
OpenSource Connections
162 views
•
25 slides
The right path to making search relevant - Taxonomy Bootcamp London 2019
OpenSource Connections
992 views
•
56 slides
Payloads and OCR with Solr
OpenSource Connections
655 views
•
22 slides
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
OpenSource Connections
498 views
•
5 slides
More from OpenSource Connections
(20)
Encores
OpenSource Connections
•
2K views
Test driven relevancy
OpenSource Connections
•
272 views
How To Structure Your Search Team for Success
OpenSource Connections
•
162 views
The right path to making search relevant - Taxonomy Bootcamp London 2019
OpenSource Connections
•
992 views
Payloads and OCR with Solr
OpenSource Connections
•
655 views
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
OpenSource Connections
•
498 views
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
OpenSource Connections
•
266 views
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
OpenSource Connections
•
318 views
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
OpenSource Connections
•
243 views
Haystack 2019 - Search with Vectors - Simon Hughes
OpenSource Connections
•
1.6K views
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
•
700 views
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
OpenSource Connections
•
334 views
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
OpenSource Connections
•
718 views
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
OpenSource Connections
•
469 views
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
OpenSource Connections
•
113 views
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
OpenSource Connections
•
317 views
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
OpenSource Connections
•
165 views
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
OpenSource Connections
•
553 views
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
OpenSource Connections
•
347 views
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
OpenSource Connections
•
340 views
Recently uploaded
Building Real-Time Travel Alerts
Timothy Spann
102 views
•
48 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig2004
5 views
•
30 slides
PTicketInput.pdf
stuartmcphersonflipm
314 views
•
1 slide
How Leaders See Data? (Level 1)
Narendra Narendra
10 views
•
76 slides
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials
8 views
•
13 slides
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher
35 views
•
32 slides
Recently uploaded
(20)
Building Real-Time Travel Alerts
Timothy Spann
•
102 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig2004
•
5 views
PTicketInput.pdf
stuartmcphersonflipm
•
314 views
How Leaders See Data? (Level 1)
Narendra Narendra
•
10 views
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials
•
8 views
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher
•
35 views
Journey of Generative AI
thomasjvarghese49
•
18 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas12611618
•
8 views
UNEP FI CRS Climate Risk Results.pptx
pekka28
•
11 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej
•
6 views
Data structure and algorithm.
Abdul salam
•
12 views
MOSORE_BRESCIA
Federico Karagulian
•
5 views
3196 The Case of The East River
ErickANDRADE90
•
11 views
Survey on Factuality in LLM's.pptx
NeethaSherra1
•
5 views
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
StatsCommunications
•
55 views
Cross-network in Google Analytics 4.pdf
GA4 Tutorials
•
6 views
Advanced_Recommendation_Systems_Presentation.pptx
neeharikasingh29
•
5 views
PROGRAMME.pdf
HiNedHaJar
•
14 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski
•
10 views
ColonyOS
JohanKristiansson6
•
9 views
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison
1.
© 2019 The
MITRE Corporation. All rights reserved. Quaerite – Search Relevance Toolkit Tim Allison tallison@apache.org, @_tallison April 24, 2019 Haystack Conference Approved for Public Release; Distribution Unlimited. Case Number 18-3138-5
2.
| 2 | ©
2019 The MITRE Corporation. All rights reserved. Debt of Gratitude ▪ Thank you Doug Turnbull, John Berryman and Open Source Connections for the inspiration/examples/training with tmdb and for sharing your ground truth set!
3.
| 3 | ©
2019 The MITRE Corporation. All rights reserved. Yet Another Toolkit? Why!? ▪ How many parameters do we have? ▪ How many permutations of those parameters are available?
4.
| 4 | ©
2019 The MITRE Corporation. All rights reserved. Available Parameters ▪ 14 tokenizers https://lucene.apache.org/solr/guide/7_1/tokenizers.html ▪ ~45 token filters (not including language-specific token filters – see next slide) https://lucene.apache.org/solr/guide/7_1/filter-descriptions.html ▪ Query parsers ▪ Query operators, minimum should match, should, must, not ▪ Token/field based scoring – best_fields, most_fields, cross_fields ▪ Field boosting ▪ Phrasal boosting/shingling ▪ Synonym lists, taxonomies ▪ Similarity scoring parameters (with BM25) ▪ Elevate ▪ External signal enrichment – manual or automatic (NLP – entity extraction, categorization, etc.) ▪ Reranking via machine learning (Learning to Rank) | 4 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
5.
| 5 | ©
2019 The MITRE Corporation. All rights reserved. Each Token Filter Can Have Many Parameters <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1"/> | 5 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
6.
| 6 | ©
2019 The MITRE Corporation. All rights reserved. Overview – Offline testing toolkit Prerequisites: 1. Reliable, generalizable ground truth 2. Reliable, useful underlying data 3. Offline metric has to have some connection to KPIs 4. Expertise – you still have to know what you’re doing!!!
7.
| 7 | ©
2019 The MITRE Corporation. All rights reserved. Main Tools 1. Run Experiments 2. Generate Experiments ▪ All permutations (grid search) ▪ Random experiments (random search) 3. Genetic Algorithm ▪ Cross-fold validation!!! ▪ Complementary to LTR -- main diff is algorithm and in running offline to tune general settings rather than as reranking top n
8.
| 8 | ©
2019 The MITRE Corporation. All rights reserved. Odds and Ends ▪ Analyzer Comparison over (mostly) the index ▪ Significant Terms (yawn…for archaic versions of Solr)…and planning to add these as parameters in “generate experiments”
9.
| 9 | ©
2019 The MITRE Corporation. All rights reserved. Adding Porter Stemming: create account creat created: 709 create: 551 creating: 269 creates: 153 creat: 1 account account: 3244 accounts: 1924 accounting: 1548 accountants: 340 accountant: 176 accounted: 134 accountability: 74 accountable: 74 accountancy: 65 account's: 7 accountant's: 7
10.
| 10 | ©
2019 The MITRE Corporation. All rights reserved. Status ▪ Alpha release 3/22/2019 (Solr only) ▪ Beta1 release this week (?) – This will include support for ElasticSearch ▪ Dream – Incorporate experiment generation/GA into Rated Ranking Evaluator (RRE) – Apache Incubator -> Top Level Project (TLP)
11.
| 11 | ©
2019 The MITRE Corporation. All rights reserved. Links ▪ Main site: https://github.com/mitre/quaerite ▪ Examples: https://github.com/mitre/quaerite/blob/master/quaerite- examples/README.md ▪ Contact – tallison@apache.org – @_tallison