SlideShare a Scribd company logo
1 of 25
Mix and Match: Collaborative Expert-Crowd
Judging for Building Test Collections
Accurately & Affordably
Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri,
Tamer Elsayed, & Matthew Lease
UT Austin -&- Qatar U
Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 96 universities around the world
www.ischools.org
What’s an Information School?
2
• Problem Statement
• Related Work
• Datasets
• Mix & Match: Methods & Results
Roadmap
Proceedings of the First Biennial Conference on Design of Experimental
Search & Information Retrieval Systems (DESIRES), Bertinoro, Italy, August 28-31, 2018.
Problem Statement
• Traditional relevance assessors & processes (eg
TREC) remain most reliable and trusted
• Non-traditional relevance judging (eg crowd)
offers ease, affordability, & speed/scalability, but
more variability in quality
• How can we make the best use of both?
– Crowd may better judge some documents/topics than
others; can we divide the work appropriately?
– Use crowd in cases we expect their judgments would
match those of traditional judges (ie, be “correct”)
4
Related Work
@mattlease
Systematic Review is e-Discovery
in Doctor’s Clothing
Joint work with
SIGIR 2016 Workshop on Medical IR (MedIR)
Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas)
Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
Hybrid Man-Machine Relevance Judging
• Systematic review (medicine) and e-Discovery
(law / civil procedure) have traditionally relied
on trusted doctors/lawyers for judging
• Automatic relevance classification is more
efficient but less accurate
• Recent active learning work has investigated
hybrid man-machine judging combinations
– e.g., TAR & TREC Legal Track, recent CLEF track
10
Hybrid Crowd-Machine Labeling
• Dynamic labeling models select which example to
label next, how many crowd labels to collect, &
which examples to label automatically
– Work by Weld (UW) & Mausam (IIT), e.g., TurKontrol
– Work by Kamar and Horvitz (MSR), e.g., CrowdSynth
– Our work (2 slides ahead…)
11
Decision Theoretic Active Learning
12
Combining Crowd and Expert Labels using
Decision Theoretic Active Learning
Nguyen, Wallace, & Lease, AAAI HCOMP’15
Systematic Review
13
• Built a model to predict assessor disagreement
• Built a crowd simulator based on real data
– Simulated relevance judgments (of varying quality)
• Considered cost models for expert vs. crowd judges
• Evaluated cost vs. quality of hybrid NIST-Crowd
collaborative judging models. 14
A Collaborative Approach to IR Evaluation. Aashish
Sheshadri. Master's Thesis, UT CS, May 2014
• Built a model to predict assessor disagreement
• Built a crowd simulator based on real data
– Simulated relevance judgments (of varying quality)
• Considered cost models for expert vs. crowd judges
• Evaluated cost vs. quality of hybrid NIST-Crowd
collaborative judging models. 15
This Work: Simplified, w/ Newer, Real Data
Datasets
@mattlease
TREC’09 Million Query Track
(ClueWeb’09)
• 3K MTurk judgments collected for TREC 2010
Relevance Feedback Track
– (Buckley, Smucker, & Lease, TREC’10 Notebook)
– (Grady & Lease, NAACL’10 MTurk Workshop)
– Judgments re-used in TREC Crowdsourcing Tracks
• 1st crowd judgments collected in my lab
– Relatively low quality: 65% MV / 70% DS
17
Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments
with T. McDonnell, M. Kutlu, & T. Elsayed
HCOMP 2016, Best Paper Award
18
• Scale up approach from prior HCOMP paper
• Mine rationales to understand disagreement
• But not discussed… worker behavioral data
Crowd vs. Expert: What Can Relevance Judgment
Rationales Teach Us About Assessor Disagreement?
with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed
ACM SIGIR 2018
19
• Mine crowd worker analytics (behavioral data)
to predict label quality based on behavior
Your Behavior Signals Your Reliability: Modeling
Crowd Behavioral Traces to Ensure
Quality Relevance Annotations
with T. Goyal, T. McDonnell, M. Kutlu, & T. Elsayed
AAAI HCOMP 2018
20
Data (available online)
1. TREC’09 Million Query Track (ClueWeb’09)
– Crowd judgments from TREC’09 RF Track (with Mark
Smucker), re-used in TREC Crowdsourcing Tracks
– 1st crowd judgments my lab ever collected; noisy
2. TREC’14 Web Track (ClueWeb’12)
– 25K MTurk judgments just collected with “rationales”
(Kutlu et al., SIGIR’18; Goyal et al., HCOMP’18)
– Better quality through design: 80% MV / 81% DS
21
Mix & Match: Method
• Aggregate crowd labels for consensus
– e.g., SQUARE benchmark of aggregation methods and
datasets (Sheshadri & Lease, HCOMP’13)
– http://ir.ischool.utexas.edu/square
• Prioritize (topic,document) pairs for judging
– StatAP (most important first)
– Disagreement oracle (avoid crowd disagreement) 22
Mix & Match: Results
• Analyzed various correlations with judging disagreement
– Disagreement model: (Sheshadri, 2014) Master’s Thesis
– More disagreement for ambiguous topic definitions & topics
requiring greater expertise (Kutlu et al., SIGIR’18)
• Best results when ordering by disagreement oracle
– Achieve Kendall’s τ = 0.9 when NIST performs only subset of
judgments: 55% (MQ’09) and 15-20% (WT’14)
• StatAP order beats random in WT’14, but not MQ’09
– With better judgments, seems simple & effective
23
Mix & Match: Results Detail
24
Matthew Lease - ml@utexas.edu - @mattlease
Thank You!
slideshare.net/mattlease
Lab: ir.ischool.utexas.edu

More Related Content

What's hot

Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Finding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyFinding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyRSD7 Symposium
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?@cristobalcobo
 
Computational Models in Systemic Design
Computational Models in Systemic DesignComputational Models in Systemic Design
Computational Models in Systemic DesignRSD7 Symposium
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTPrasant Misra
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Matti Luhtala
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligencePenn State EdTech Network
 
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...RSD7 Symposium
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017Big Data Spain
 

What's hot (20)

Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Finding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnographyFinding the emic in systemic design: Towards systemic ethnography
Finding the emic in systemic design: Towards systemic ethnography
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?What role do “power learners” play in online learning communities?
What role do “power learners” play in online learning communities?
 
Computational Models in Systemic Design
Computational Models in Systemic DesignComputational Models in Systemic Design
Computational Models in Systemic Design
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoT
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
 
Lecture #03
Lecture #03Lecture #03
Lecture #03
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial Intelligence
 
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...Systemic Design Labs (SDL): Incubating systemic design skills through experie...
Systemic Design Labs (SDL): Incubating systemic design skills through experie...
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 

Similar to Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Adventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondAdventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondMatthew Lease
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Gregory Piatetsky-Shapiro
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesMichael Andreas Zeng
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Josh Sheldon
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...Daniele Malitesta
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
Presentation 2019 08-30
Presentation 2019 08-30Presentation 2019 08-30
Presentation 2019 08-30Mahdi_Fahmideh
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 

Similar to Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably (20)

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Adventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & BeyondAdventures in Crowdsourcing: Research at UT Austin & Beyond
Adventures in Crowdsourcing: Research at UT Austin & Beyond
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable Energies
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
CrowDM system
CrowDM systemCrowDM system
CrowDM system
 
Presentation 2019 08-30
Presentation 2019 08-30Presentation 2019 08-30
Presentation 2019 08-30
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMatthew Lease
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not Anonymous
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably

  • 1. Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately & Affordably Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri, Tamer Elsayed, & Matthew Lease UT Austin -&- Qatar U Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at 96 universities around the world www.ischools.org What’s an Information School? 2
  • 3. • Problem Statement • Related Work • Datasets • Mix & Match: Methods & Results Roadmap Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), Bertinoro, Italy, August 28-31, 2018.
  • 4. Problem Statement • Traditional relevance assessors & processes (eg TREC) remain most reliable and trusted • Non-traditional relevance judging (eg crowd) offers ease, affordability, & speed/scalability, but more variability in quality • How can we make the best use of both? – Crowd may better judge some documents/topics than others; can we divide the work appropriately? – Use crowd in cases we expect their judgments would match those of traditional judges (ie, be “correct”) 4
  • 6. Systematic Review is e-Discovery in Doctor’s Clothing Joint work with SIGIR 2016 Workshop on Medical IR (MedIR) Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas) Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
  • 7.
  • 8.
  • 9.
  • 10. Hybrid Man-Machine Relevance Judging • Systematic review (medicine) and e-Discovery (law / civil procedure) have traditionally relied on trusted doctors/lawyers for judging • Automatic relevance classification is more efficient but less accurate • Recent active learning work has investigated hybrid man-machine judging combinations – e.g., TAR & TREC Legal Track, recent CLEF track 10
  • 11. Hybrid Crowd-Machine Labeling • Dynamic labeling models select which example to label next, how many crowd labels to collect, & which examples to label automatically – Work by Weld (UW) & Mausam (IIT), e.g., TurKontrol – Work by Kamar and Horvitz (MSR), e.g., CrowdSynth – Our work (2 slides ahead…) 11
  • 13. Combining Crowd and Expert Labels using Decision Theoretic Active Learning Nguyen, Wallace, & Lease, AAAI HCOMP’15 Systematic Review 13
  • 14. • Built a model to predict assessor disagreement • Built a crowd simulator based on real data – Simulated relevance judgments (of varying quality) • Considered cost models for expert vs. crowd judges • Evaluated cost vs. quality of hybrid NIST-Crowd collaborative judging models. 14 A Collaborative Approach to IR Evaluation. Aashish Sheshadri. Master's Thesis, UT CS, May 2014
  • 15. • Built a model to predict assessor disagreement • Built a crowd simulator based on real data – Simulated relevance judgments (of varying quality) • Considered cost models for expert vs. crowd judges • Evaluated cost vs. quality of hybrid NIST-Crowd collaborative judging models. 15 This Work: Simplified, w/ Newer, Real Data
  • 17. TREC’09 Million Query Track (ClueWeb’09) • 3K MTurk judgments collected for TREC 2010 Relevance Feedback Track – (Buckley, Smucker, & Lease, TREC’10 Notebook) – (Grady & Lease, NAACL’10 MTurk Workshop) – Judgments re-used in TREC Crowdsourcing Tracks • 1st crowd judgments collected in my lab – Relatively low quality: 65% MV / 70% DS 17
  • 18. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments with T. McDonnell, M. Kutlu, & T. Elsayed HCOMP 2016, Best Paper Award 18
  • 19. • Scale up approach from prior HCOMP paper • Mine rationales to understand disagreement • But not discussed… worker behavioral data Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed ACM SIGIR 2018 19
  • 20. • Mine crowd worker analytics (behavioral data) to predict label quality based on behavior Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations with T. Goyal, T. McDonnell, M. Kutlu, & T. Elsayed AAAI HCOMP 2018 20
  • 21. Data (available online) 1. TREC’09 Million Query Track (ClueWeb’09) – Crowd judgments from TREC’09 RF Track (with Mark Smucker), re-used in TREC Crowdsourcing Tracks – 1st crowd judgments my lab ever collected; noisy 2. TREC’14 Web Track (ClueWeb’12) – 25K MTurk judgments just collected with “rationales” (Kutlu et al., SIGIR’18; Goyal et al., HCOMP’18) – Better quality through design: 80% MV / 81% DS 21
  • 22. Mix & Match: Method • Aggregate crowd labels for consensus – e.g., SQUARE benchmark of aggregation methods and datasets (Sheshadri & Lease, HCOMP’13) – http://ir.ischool.utexas.edu/square • Prioritize (topic,document) pairs for judging – StatAP (most important first) – Disagreement oracle (avoid crowd disagreement) 22
  • 23. Mix & Match: Results • Analyzed various correlations with judging disagreement – Disagreement model: (Sheshadri, 2014) Master’s Thesis – More disagreement for ambiguous topic definitions & topics requiring greater expertise (Kutlu et al., SIGIR’18) • Best results when ordering by disagreement oracle – Achieve Kendall’s τ = 0.9 when NIST performs only subset of judgments: 55% (MQ’09) and 15-20% (WT’14) • StatAP order beats random in WT’14, but not MQ’09 – With better judgments, seems simple & effective 23
  • 24. Mix & Match: Results Detail 24
  • 25. Matthew Lease - ml@utexas.edu - @mattlease Thank You! slideshare.net/mattlease Lab: ir.ischool.utexas.edu