SlideShare a Scribd company logo
1 of 25
Mix and Match: Collaborative Expert-Crowd
Judging for Building Test Collections
Accurately & Affordably
Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri,
Tamer Elsayed, & Matthew Lease
UT Austin -&- Qatar U
Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 96 universities around the world
www.ischools.org
What’s an Information School?
2
• Problem Statement
• Related Work
• Datasets
• Mix & Match: Methods & Results
Roadmap
Proceedings of the First Biennial Conference on Design of Experimental
Search & Information Retrieval Systems (DESIRES), Bertinoro, Italy, August 28-31, 2018.
Problem Statement
• Traditional relevance assessors & processes (eg
TREC) remain most reliable and trusted
• Non-traditional relevance judging (eg crowd)
offers ease, affordability, & speed/scalability, but
more variability in quality
• How can we make the best use of both?
– Crowd may better judge some documents/topics than
others; can we divide the work appropriately?
– Use crowd in cases we expect their judgments would
match those of traditional judges (ie, be “correct”)
4
Related Work
@mattlease
Systematic Review is e-Discovery
in Doctor’s Clothing
Joint work with
SIGIR 2016 Workshop on Medical IR (MedIR)
Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas)
Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
Hybrid Man-Machine Relevance Judging
• Systematic review (medicine) and e-Discovery
(law / civil procedure) have traditionally relied
on trusted doctors/lawyers for judging
• Automatic relevance classification is more
efficient but less accurate
• Recent active learning work has investigated
hybrid man-machine judging combinations
– e.g., TAR & TREC Legal Track, recent CLEF track
10
Hybrid Crowd-Machine Labeling
• Dynamic labeling models select which example to
label next, how many crowd labels to collect, &
which examples to label automatically
– Work by Weld (UW) & Mausam (IIT), e.g., TurKontrol
– Work by Kamar and Horvitz (MSR), e.g., CrowdSynth
– Our work (2 slides ahead…)
11
Decision Theoretic Active Learning
12
Combining Crowd and Expert Labels using
Decision Theoretic Active Learning
Nguyen, Wallace, & Lease, AAAI HCOMP’15
Systematic Review
13
• Built a model to predict assessor disagreement
• Built a crowd simulator based on real data
– Simulated relevance judgments (of varying quality)
• Considered cost models for expert vs. crowd judges
• Evaluated cost vs. quality of hybrid NIST-Crowd
collaborative judging models. 14
A Collaborative Approach to IR Evaluation. Aashish
Sheshadri. Master's Thesis, UT CS, May 2014
• Built a model to predict assessor disagreement
• Built a crowd simulator based on real data
– Simulated relevance judgments (of varying quality)
• Considered cost models for expert vs. crowd judges
• Evaluated cost vs. quality of hybrid NIST-Crowd
collaborative judging models. 15
This Work: Simplified, w/ Newer, Real Data
Datasets
@mattlease
TREC’09 Million Query Track
(ClueWeb’09)
• 3K MTurk judgments collected for TREC 2010
Relevance Feedback Track
– (Buckley, Smucker, & Lease, TREC’10 Notebook)
– (Grady & Lease, NAACL’10 MTurk Workshop)
– Judgments re-used in TREC Crowdsourcing Tracks
• 1st crowd judgments collected in my lab
– Relatively low quality: 65% MV / 70% DS
17
Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments
with T. McDonnell, M. Kutlu, & T. Elsayed
HCOMP 2016, Best Paper Award
18
• Scale up approach from prior HCOMP paper
• Mine rationales to understand disagreement
• But not discussed… worker behavioral data
Crowd vs. Expert: What Can Relevance Judgment
Rationales Teach Us About Assessor Disagreement?
with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed
ACM SIGIR 2018
19
• Mine crowd worker analytics (behavioral data)
to predict label quality based on behavior
Your Behavior Signals Your Reliability: Modeling
Crowd Behavioral Traces to Ensure
Quality Relevance Annotations
with T. Goyal, T. McDonnell, M. Kutlu, & T. Elsayed
AAAI HCOMP 2018
20
Data (available online)
1. TREC’09 Million Query Track (ClueWeb’09)
– Crowd judgments from TREC’09 RF Track (with Mark
Smucker), re-used in TREC Crowdsourcing Tracks
– 1st crowd judgments my lab ever collected; noisy
2. TREC’14 Web Track (ClueWeb’12)
– 25K MTurk judgments just collected with “rationales”
(Kutlu et al., SIGIR’18; Goyal et al., HCOMP’18)
– Better quality through design: 80% MV / 81% DS
21
Mix & Match: Method
• Aggregate crowd labels for consensus
– e.g., SQUARE benchmark of aggregation methods and
datasets (Sheshadri & Lease, HCOMP’13)
– http://ir.ischool.utexas.edu/square
• Prioritize (topic,document) pairs for judging
– StatAP (most important first)
– Disagreement oracle (avoid crowd disagreement) 22
Mix & Match: Results
• Analyzed various correlations with judging disagreement
– Disagreement model: (Sheshadri, 2014) Master’s Thesis
– More disagreement for ambiguous topic definitions & topics
requiring greater expertise (Kutlu et al., SIGIR’18)
• Best results when ordering by disagreement oracle
– Achieve Kendall’s τ = 0.9 when NIST performs only subset of
judgments: 55% (MQ’09) and 15-20% (WT’14)
• StatAP order beats random in WT’14, but not MQ’09
– With better judgments, seems simple & effective
23
Mix & Match: Results Detail
24
Matthew Lease - ml@utexas.edu - @mattlease
Thank You!
slideshare.net/mattlease
Lab: ir.ischool.utexas.edu

More Related Content

What's hot

Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Finding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyFinding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyRSD7 Symposium
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?@cristobalcobo
 
Computational Models in Systemic Design
Computational Models in Systemic DesignComputational Models in Systemic Design
Computational Models in Systemic DesignRSD7 Symposium
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTPrasant Misra
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Matti Luhtala
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligencePenn State EdTech Network
 
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...RSD7 Symposium
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017Big Data Spain
 

What's hot (20)

Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Finding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyFinding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnography
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?
 
Computational Models in Systemic Design
Computational Models in Systemic DesignComputational Models in Systemic Design
Computational Models in Systemic Design
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoT
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
 
Lecture #03
Lecture #03Lecture #03
Lecture #03
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial Intelligence
 
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 

Similar to Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Adventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondAdventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondMatthew Lease
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Gregory Piatetsky-Shapiro
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesMichael Andreas Zeng
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Josh Sheldon
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...Daniele Malitesta
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
Presentation 2019 08-30
Presentation 2019 08-30Presentation 2019 08-30
Presentation 2019 08-30Mahdi_Fahmideh
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 

Similar to Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably (20)

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Adventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondAdventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & Beyond
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable Energies
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
CrowDM system
CrowDM systemCrowDM system
CrowDM system
 
Presentation 2019 08-30
Presentation 2019 08-30Presentation 2019 08-30
Presentation 2019 08-30
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMatthew Lease
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not Anonymous
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably

  • 1. Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri, Tamer Elsayed, & Matthew Lease UT Austin -&- Qatar U Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at 96 universities around the world www.ischools.org What’s an Information School? 2
  • 3. • Problem Statement • Related Work • Datasets • Mix & Match: Methods & Results Roadmap Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), Bertinoro, Italy, August 28-31, 2018.
  • 4. Problem Statement • Traditional relevance assessors & processes (eg TREC) remain most reliable and trusted • Non-traditional relevance judging (eg crowd) offers ease, affordability, & speed/scalability, but more variability in quality • How can we make the best use of both? – Crowd may better judge some documents/topics than others; can we divide the work appropriately? – Use crowd in cases we expect their judgments would match those of traditional judges (ie, be “correct”) 4
  • 6. Systematic Review is e-Discovery in Doctor’s Clothing Joint work with SIGIR 2016 Workshop on Medical IR (MedIR) Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas) Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
  • 7.
  • 8.
  • 9.
  • 10. Hybrid Man-Machine Relevance Judging • Systematic review (medicine) and e-Discovery (law / civil procedure) have traditionally relied on trusted doctors/lawyers for judging • Automatic relevance classification is more efficient but less accurate • Recent active learning work has investigated hybrid man-machine judging combinations – e.g., TAR & TREC Legal Track, recent CLEF track 10
  • 11. Hybrid Crowd-Machine Labeling • Dynamic labeling models select which example to label next, how many crowd labels to collect, & which examples to label automatically – Work by Weld (UW) & Mausam (IIT), e.g., TurKontrol – Work by Kamar and Horvitz (MSR), e.g., CrowdSynth – Our work (2 slides ahead…) 11
  • 13. Combining Crowd and Expert Labels using Decision Theoretic Active Learning Nguyen, Wallace, & Lease, AAAI HCOMP’15 Systematic Review 13
  • 14. • Built a model to predict assessor disagreement • Built a crowd simulator based on real data – Simulated relevance judgments (of varying quality) • Considered cost models for expert vs. crowd judges • Evaluated cost vs. quality of hybrid NIST-Crowd collaborative judging models. 14 A Collaborative Approach to IR Evaluation. Aashish Sheshadri. Master's Thesis, UT CS, May 2014
  • 15. • Built a model to predict assessor disagreement • Built a crowd simulator based on real data – Simulated relevance judgments (of varying quality) • Considered cost models for expert vs. crowd judges • Evaluated cost vs. quality of hybrid NIST-Crowd collaborative judging models. 15 This Work: Simplified, w/ Newer, Real Data
  • 17. TREC’09 Million Query Track (ClueWeb’09) • 3K MTurk judgments collected for TREC 2010 Relevance Feedback Track – (Buckley, Smucker, & Lease, TREC’10 Notebook) – (Grady & Lease, NAACL’10 MTurk Workshop) – Judgments re-used in TREC Crowdsourcing Tracks • 1st crowd judgments collected in my lab – Relatively low quality: 65% MV / 70% DS 17
  • 18. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments with T. McDonnell, M. Kutlu, & T. Elsayed HCOMP 2016, Best Paper Award 18
  • 19. • Scale up approach from prior HCOMP paper • Mine rationales to understand disagreement • But not discussed… worker behavioral data Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed ACM SIGIR 2018 19
  • 20. • Mine crowd worker analytics (behavioral data) to predict label quality based on behavior Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations with T. Goyal, T. McDonnell, M. Kutlu, & T. Elsayed AAAI HCOMP 2018 20
  • 21. Data (available online) 1. TREC’09 Million Query Track (ClueWeb’09) – Crowd judgments from TREC’09 RF Track (with Mark Smucker), re-used in TREC Crowdsourcing Tracks – 1st crowd judgments my lab ever collected; noisy 2. TREC’14 Web Track (ClueWeb’12) – 25K MTurk judgments just collected with “rationales” (Kutlu et al., SIGIR’18; Goyal et al., HCOMP’18) – Better quality through design: 80% MV / 81% DS 21
  • 22. Mix & Match: Method • Aggregate crowd labels for consensus – e.g., SQUARE benchmark of aggregation methods and datasets (Sheshadri & Lease, HCOMP’13) – http://ir.ischool.utexas.edu/square • Prioritize (topic,document) pairs for judging – StatAP (most important first) – Disagreement oracle (avoid crowd disagreement) 22
  • 23. Mix & Match: Results • Analyzed various correlations with judging disagreement – Disagreement model: (Sheshadri, 2014) Master’s Thesis – More disagreement for ambiguous topic definitions & topics requiring greater expertise (Kutlu et al., SIGIR’18) • Best results when ordering by disagreement oracle – Achieve Kendall’s τ = 0.9 when NIST performs only subset of judgments: 55% (MQ’09) and 15-20% (WT’14) • StatAP order beats random in WT’14, but not MQ’09 – With better judgments, seems simple & effective 23
  • 24. Mix & Match: Results Detail 24
  • 25. Matthew Lease - ml@utexas.edu - @mattlease Thank You! slideshare.net/mattlease Lab: ir.ischool.utexas.edu