SlideShare a Scribd company logo
Making the Case for Human
Relevance Testing
Tara Diedrichsen & Tito Sierra
24 April 2019
Haystack Conference
Charlottesville
2
LexisNexis Confidential
Agenda
3
Agenda
 About us
 What is human relevance testing?
 Why do human relevance testing?
 Establishing a human relevance testing program
 Lessons learned
4
LexisNexis Confidential
About Us
5
Tito Sierra is VP of Global Product
Management at LexisNexis. He leads product
strategy for teams that focus on improving
search and data-driven product development
across LexisNexis products globally. Tito has
worked on search in both academic and legal
domains for over 15 years and is passionate
about improving search quality for users.
Tara Diedrichsen is a Senior Global Product
Manager in the Global Product Team at
LexisNexis. She works with product teams
around the global on search optimization,
including the adoption of new search
algorithms as well new metrics and testing
methods. In this capacity she manages the
human judgement relevance testing program
at LexisNexis globally.
About Us
5
6
About LexisNexis
LexisNexis Legal & Professional is a leading
global provider of information and analytics to
professionals in legal, corporate, government
and non-profit organizations.
We’re a part of RELX Group, serving customers
in more than 130 countries with 10,000
employees worldwide. Our information network
contains 3 petabytes of legal and news data with
65 billion documents. That’s 150 times the size of
Wikipedia and doubling every three years.
7
Who are our users?
…and more.
Attorneys Librarians & Information
Professionals
Students Paralegals &
Secretaries
Media
Professionals
Financial
Professionals
Non-Profit & Development
Professionals
LexisNexis Flagship Research Products
Lexis Advance
& Nexis
9
LexisNexis Confidential
What is human
relevance testing?
10
What is human
relevance testing?
Offline search relevance quality
evaluation using human raters.
Complements other methods for
evaluating search quality, such
as user engagement-based
product analytics.
Results list: assess overall relevance of
top ranking results list
11
Document level: assess relevance of
individual top ranking documents
Different Approaches to Search Results Evaluation
Why do human
relevance testing?
13
Why invest in search
relevance testing?
Search is the biggest driver
for both customer satisfaction
and dissatisfaction with our
products.
14
2016
2019
Lexis Advance
Lexis Advance Quicklaw (Canada)
Lexis Advance Pacific (Australia)
Lexis Practice Advisor (US)
Nexis Uni
Nexis
Lexis PSL (UK)
Lexis Advance Malaysia
Lexis 360 (France)
LexisNexis Confidential
Human Relevance Testing at LexisNexis
In 2016, LexisNexis initiated a
formal human relevance testing
program for our US flagship
legal research product Lexis
Advance.
Since 2018, the program has
expanded to include a growing
number of LexisNexis products
globally.
Applications of Human Relevance Testing
 Baseline search relevance quality evaluation for a product
 Relevance quality evaluation over time
 New algorithm evaluations
 New search engine evaluations
 Competitive benchmarking
15
Establishing a
Human Relevance
Testing Program
Human Relevance Testing Framework
A rigorous approach maximises the value of human relevance testing.
 Selecting and managing raters
 Selecting test queries
 Collecting ratings
 Generating scores
 Analysing & communicating results
17
Selecting and Managing Raters
Selection considerations :
 Subject matter expertise
 Cost/available budget
 Rater capacity
18
Management considerations:
 Rating guidelines and training
 Rater consistency
 Metrics (e.g. IRR, average ratings)
Selecting Test Queries
Query attributes:
 Actual user queries vs constructed queries
 Anonymized vs user specific
 Randomly sourced vs curated
 Suitability for relevance evaluation (e.g. navigational queries)
Rater capacity necessarily limits the number of queries that be tested.
19
20
Collecting Ratings
Select a consistent rubric
for collecting ratings.
Gathering multiple ratings
per document can help
solidify findings.
Verbatim comments are
also extremely useful.
Averages
for query
set
Individual
query
scores
21
Generating Scores
Compute query level scores based
on ratings data.
LexisNexis computes DCG
(Discounted Cumulative Gain)
which is an industry standard
measure of relevance quality.
The graded relevance value of a
document is reduced (discounted)
proportional to its position in the
results.
Reference material:
https://en.wikipedia.org/wiki/Discounted_cumulative_gain
Low Scoring Query
Analysis
Analysis of low scoring DCG
queries delivers actionable
insights around user pain
points/gaps and areas for
improvement.
22
23
Communication of Results
Communication of results, including analysis and recommendations,
to development teams and business stakeholders is an important
final step before wrapping up a test.
Outcomes:
 Customer Focus
 Innovation & Experimentation
 Motivation
 Stakeholder Buy-In & Engagement
 Culture Change
Establish DCG
(re)baseline
Develop new
algorithm
Use raters to
grade new top
documents
Calculate DCG
scores and
analyze results
Prepare change
for release or
iterate
24
New Algorithm Evaluation
Human relevance testing provides
a robust mechanism for pre-
release offline testing that can
improve time to market while also
driving quality.
Outcomes:
 Innovation & Experimentation
 Quality
 Reuse
 Time to Market
Experimental
query plan. Baseline
25
New Algorithm Evaluation
Results are
compared against
the DCG baseline
in order to identify
trends and
determine next
steps – e.g. iterate
further on a new
algorithm or move
forward with A/B
testing.
26
Setting Goals for Improving Search
Human relevance testing can be used
to set and track goals for improving
search.
Outcomes:
 Accountability
 Motivation
 Reuse
 Stakeholder Buy-in & Engagement
 Culture change
Lessons Learned
Lessons Learned
 Identify raters and secure commitment before doing anything else
 Scope studies in line with rater capacity
 Budget time to do analysis and communicate results
 Tangible examples help drive engagement and buy-in
 Run regular feedback sessions
 A program oriented approach to quality can drive culture change
This work can be expensive, but it is worth it!
28
Questions?
tara.diedrichsen@lexisnexis.com
tito.sierra@lexisnexis.com

More Related Content

What's hot

Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryNeo4j
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 
RecSysTEL lecture at advanced SIKS course, NL
RecSysTEL lecture at advanced SIKS course, NLRecSysTEL lecture at advanced SIKS course, NL
RecSysTEL lecture at advanced SIKS course, NLHendrik Drachsler
 
AI & Big Data - Personalização da Jornada - PicPay - TDC
AI & Big Data - Personalização da Jornada - PicPay - TDCAI & Big Data - Personalização da Jornada - PicPay - TDC
AI & Big Data - Personalização da Jornada - PicPay - TDCRenan Moreira de Oliveira
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
눈으로 듣는 음악 추천 시스템-2018 if-kakao
눈으로 듣는 음악 추천 시스템-2018 if-kakao눈으로 듣는 음악 추천 시스템-2018 if-kakao
눈으로 듣는 음악 추천 시스템-2018 if-kakaochoi kyumin
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Automação de Teste em Front End - Caipira Ágil
Automação de Teste em Front End - Caipira ÁgilAutomação de Teste em Front End - Caipira Ágil
Automação de Teste em Front End - Caipira ÁgilElias Nogueira
 
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.BOAZ Bigdata
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – IntroductionNETWAYS
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender systemPalakNath
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 

What's hot (20)

Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
RecSysTEL lecture at advanced SIKS course, NL
RecSysTEL lecture at advanced SIKS course, NLRecSysTEL lecture at advanced SIKS course, NL
RecSysTEL lecture at advanced SIKS course, NL
 
AI & Big Data - Personalização da Jornada - PicPay - TDC
AI & Big Data - Personalização da Jornada - PicPay - TDCAI & Big Data - Personalização da Jornada - PicPay - TDC
AI & Big Data - Personalização da Jornada - PicPay - TDC
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
눈으로 듣는 음악 추천 시스템-2018 if-kakao
눈으로 듣는 음악 추천 시스템-2018 if-kakao눈으로 듣는 음악 추천 시스템-2018 if-kakao
눈으로 듣는 음악 추천 시스템-2018 if-kakao
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Automação de Teste em Front End - Caipira Ágil
Automação de Teste em Front End - Caipira ÁgilAutomação de Teste em Front End - Caipira Ágil
Automação de Teste em Front End - Caipira Ágil
 
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [스포 적발 강력 1팀] : 네 리뷰가 스포라는 것을 스포한다.
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introduction
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender system
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 

Similar to Haystack 2019 - Making the case for human judgement relevance testing - Tara Diedrichsen and Tito Sierra

Healthcare Portfolio | RemotePanda
Healthcare Portfolio | RemotePandaHealthcare Portfolio | RemotePanda
Healthcare Portfolio | RemotePandaRemotePanda
 
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...Blooming analytics! The germination of a new Jisc/HESA service for data-drive...
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...Jisc
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Optimizely
 
Healthcare Portfolio | MindBowser
Healthcare Portfolio | MindBowserHealthcare Portfolio | MindBowser
Healthcare Portfolio | MindBowserMindbowser Inc
 
LNS Research's Global Executive Council
LNS Research's Global Executive CouncilLNS Research's Global Executive Council
LNS Research's Global Executive CouncilLNSResearch
 
Learning's Big Data Problem: Measuring & Analyzing Impact
Learning's Big Data Problem: Measuring & Analyzing ImpactLearning's Big Data Problem: Measuring & Analyzing Impact
Learning's Big Data Problem: Measuring & Analyzing ImpactWatershed
 
ClickZ Live: Smart Analytics
ClickZ Live: Smart AnalyticsClickZ Live: Smart Analytics
ClickZ Live: Smart AnalyticsKristin Low
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementNick Lynch
 
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...havoc2003
 
17568 hbr sas report_webview
17568 hbr sas report_webview17568 hbr sas report_webview
17568 hbr sas report_webviewR Sekar Ramajeyam
 
Search Discovery Analytics Benchmark
Search Discovery Analytics BenchmarkSearch Discovery Analytics Benchmark
Search Discovery Analytics BenchmarkBradford Harbert
 
Common sense is not common practice in alliances
Common sense is not common practice in alliancesCommon sense is not common practice in alliances
Common sense is not common practice in alliancesMike Nevin
 
Arathi Rao
Arathi RaoArathi Rao
Arathi Raoarti_12
 
Arathi Rao
Arathi RaoArathi Rao
Arathi Raoarti_12
 
Human Capital Growth Webinar: 2018 HCG talent development benchmark study
Human Capital Growth Webinar: 2018 HCG talent development benchmark studyHuman Capital Growth Webinar: 2018 HCG talent development benchmark study
Human Capital Growth Webinar: 2018 HCG talent development benchmark studyHuman Capital Growth
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?SAS Canada
 

Similar to Haystack 2019 - Making the case for human judgement relevance testing - Tara Diedrichsen and Tito Sierra (20)

Healthcare Portfolio | RemotePanda
Healthcare Portfolio | RemotePandaHealthcare Portfolio | RemotePanda
Healthcare Portfolio | RemotePanda
 
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...Blooming analytics! The germination of a new Jisc/HESA service for data-drive...
Blooming analytics! The germination of a new Jisc/HESA service for data-drive...
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
 
Healthcare Portfolio | MindBowser
Healthcare Portfolio | MindBowserHealthcare Portfolio | MindBowser
Healthcare Portfolio | MindBowser
 
Frontiers 2015, by 3 Pillar, CES, Rockbridge
Frontiers 2015, by 3 Pillar, CES, RockbridgeFrontiers 2015, by 3 Pillar, CES, Rockbridge
Frontiers 2015, by 3 Pillar, CES, Rockbridge
 
HSI_Intro_Short
HSI_Intro_ShortHSI_Intro_Short
HSI_Intro_Short
 
LNS Research's Global Executive Council
LNS Research's Global Executive CouncilLNS Research's Global Executive Council
LNS Research's Global Executive Council
 
Learning's Big Data Problem: Measuring & Analyzing Impact
Learning's Big Data Problem: Measuring & Analyzing ImpactLearning's Big Data Problem: Measuring & Analyzing Impact
Learning's Big Data Problem: Measuring & Analyzing Impact
 
ClickZ Live: Smart Analytics
ClickZ Live: Smart AnalyticsClickZ Live: Smart Analytics
ClickZ Live: Smart Analytics
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
 
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...
Tricentis-report_Forrester-Modernizing-Testing-to-Accelerate-Digital-Business...
 
The evolution of decision making
The evolution of decision makingThe evolution of decision making
The evolution of decision making
 
17568 hbr sas report_webview
17568 hbr sas report_webview17568 hbr sas report_webview
17568 hbr sas report_webview
 
Search Discovery Analytics Benchmark
Search Discovery Analytics BenchmarkSearch Discovery Analytics Benchmark
Search Discovery Analytics Benchmark
 
The rde methodology
The rde methodologyThe rde methodology
The rde methodology
 
Common sense is not common practice in alliances
Common sense is not common practice in alliancesCommon sense is not common practice in alliances
Common sense is not common practice in alliances
 
Arathi Rao
Arathi RaoArathi Rao
Arathi Rao
 
Arathi Rao
Arathi RaoArathi Rao
Arathi Rao
 
Human Capital Growth Webinar: 2018 HCG talent development benchmark study
Human Capital Growth Webinar: 2018 HCG talent development benchmark studyHuman Capital Growth Webinar: 2018 HCG talent development benchmark study
Human Capital Growth Webinar: 2018 HCG talent development benchmark study
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
 

More from OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Recently uploaded

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxbenishzehra469
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 

Recently uploaded (20)

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 

Haystack 2019 - Making the case for human judgement relevance testing - Tara Diedrichsen and Tito Sierra

  • 1. Making the Case for Human Relevance Testing Tara Diedrichsen & Tito Sierra 24 April 2019 Haystack Conference Charlottesville
  • 3. 3 Agenda  About us  What is human relevance testing?  Why do human relevance testing?  Establishing a human relevance testing program  Lessons learned
  • 5. 5 Tito Sierra is VP of Global Product Management at LexisNexis. He leads product strategy for teams that focus on improving search and data-driven product development across LexisNexis products globally. Tito has worked on search in both academic and legal domains for over 15 years and is passionate about improving search quality for users. Tara Diedrichsen is a Senior Global Product Manager in the Global Product Team at LexisNexis. She works with product teams around the global on search optimization, including the adoption of new search algorithms as well new metrics and testing methods. In this capacity she manages the human judgement relevance testing program at LexisNexis globally. About Us 5
  • 6. 6 About LexisNexis LexisNexis Legal & Professional is a leading global provider of information and analytics to professionals in legal, corporate, government and non-profit organizations. We’re a part of RELX Group, serving customers in more than 130 countries with 10,000 employees worldwide. Our information network contains 3 petabytes of legal and news data with 65 billion documents. That’s 150 times the size of Wikipedia and doubling every three years.
  • 7. 7 Who are our users? …and more. Attorneys Librarians & Information Professionals Students Paralegals & Secretaries Media Professionals Financial Professionals Non-Profit & Development Professionals
  • 8. LexisNexis Flagship Research Products Lexis Advance & Nexis
  • 9. 9 LexisNexis Confidential What is human relevance testing?
  • 10. 10 What is human relevance testing? Offline search relevance quality evaluation using human raters. Complements other methods for evaluating search quality, such as user engagement-based product analytics.
  • 11. Results list: assess overall relevance of top ranking results list 11 Document level: assess relevance of individual top ranking documents Different Approaches to Search Results Evaluation
  • 13. 13 Why invest in search relevance testing? Search is the biggest driver for both customer satisfaction and dissatisfaction with our products.
  • 14. 14 2016 2019 Lexis Advance Lexis Advance Quicklaw (Canada) Lexis Advance Pacific (Australia) Lexis Practice Advisor (US) Nexis Uni Nexis Lexis PSL (UK) Lexis Advance Malaysia Lexis 360 (France) LexisNexis Confidential Human Relevance Testing at LexisNexis In 2016, LexisNexis initiated a formal human relevance testing program for our US flagship legal research product Lexis Advance. Since 2018, the program has expanded to include a growing number of LexisNexis products globally.
  • 15. Applications of Human Relevance Testing  Baseline search relevance quality evaluation for a product  Relevance quality evaluation over time  New algorithm evaluations  New search engine evaluations  Competitive benchmarking 15
  • 17. Human Relevance Testing Framework A rigorous approach maximises the value of human relevance testing.  Selecting and managing raters  Selecting test queries  Collecting ratings  Generating scores  Analysing & communicating results 17
  • 18. Selecting and Managing Raters Selection considerations :  Subject matter expertise  Cost/available budget  Rater capacity 18 Management considerations:  Rating guidelines and training  Rater consistency  Metrics (e.g. IRR, average ratings)
  • 19. Selecting Test Queries Query attributes:  Actual user queries vs constructed queries  Anonymized vs user specific  Randomly sourced vs curated  Suitability for relevance evaluation (e.g. navigational queries) Rater capacity necessarily limits the number of queries that be tested. 19
  • 20. 20 Collecting Ratings Select a consistent rubric for collecting ratings. Gathering multiple ratings per document can help solidify findings. Verbatim comments are also extremely useful.
  • 21. Averages for query set Individual query scores 21 Generating Scores Compute query level scores based on ratings data. LexisNexis computes DCG (Discounted Cumulative Gain) which is an industry standard measure of relevance quality. The graded relevance value of a document is reduced (discounted) proportional to its position in the results. Reference material: https://en.wikipedia.org/wiki/Discounted_cumulative_gain
  • 22. Low Scoring Query Analysis Analysis of low scoring DCG queries delivers actionable insights around user pain points/gaps and areas for improvement. 22
  • 23. 23 Communication of Results Communication of results, including analysis and recommendations, to development teams and business stakeholders is an important final step before wrapping up a test. Outcomes:  Customer Focus  Innovation & Experimentation  Motivation  Stakeholder Buy-In & Engagement  Culture Change
  • 24. Establish DCG (re)baseline Develop new algorithm Use raters to grade new top documents Calculate DCG scores and analyze results Prepare change for release or iterate 24 New Algorithm Evaluation Human relevance testing provides a robust mechanism for pre- release offline testing that can improve time to market while also driving quality. Outcomes:  Innovation & Experimentation  Quality  Reuse  Time to Market
  • 25. Experimental query plan. Baseline 25 New Algorithm Evaluation Results are compared against the DCG baseline in order to identify trends and determine next steps – e.g. iterate further on a new algorithm or move forward with A/B testing.
  • 26. 26 Setting Goals for Improving Search Human relevance testing can be used to set and track goals for improving search. Outcomes:  Accountability  Motivation  Reuse  Stakeholder Buy-in & Engagement  Culture change
  • 28. Lessons Learned  Identify raters and secure commitment before doing anything else  Scope studies in line with rater capacity  Budget time to do analysis and communicate results  Tangible examples help drive engagement and buy-in  Run regular feedback sessions  A program oriented approach to quality can drive culture change This work can be expensive, but it is worth it! 28