SlideShare a Scribd company logo
Your Behavior Signals Your Reliability:
Modeling Crowd Behavioral Traces to Ensure
Quality Relevance Annotations
Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, &
Matthew Lease
UT Austin -&- Qatar U
Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 96 universities around the world
www.ischools.org
What’s an Information School?
2
• Behavioral data: what & why?
• Prediction Tasks & Models
• The Labeling Task: Search Relevance Judging
• Data & Evaluation (three scenarios)
• Discussion & Future Work
Roadmap
How to Assess Crowd Work?
• Typical two approaches
– 1. Compare labels vs. expert’s (e.g., “gold”)
– 2. Compare labels vs. peers (e.g., MV, EM)
• 3. Compare labels to model predictions
– e.g., Ryu & Lease, ASIS&T’11
• 4. Collect & assess worker behavioral data
4
Worker Behavioral Data (Analytics)
• Could reduce need for experts & redundant work
• Could combine with other QC methods
• Could address “cold-start” problem
– Predict quality from worker’s first label via behavior 5
• Instrumenting the crowd: using implicit behavioral measures
to predict task performance (Rzeszotarski & Kittur, UIST’11)
– Correlate crowd behavior with crowd vs. expert labels
– Each worker assigned “pass/fail”; DT predicts via behavior
• Quality management in crowdsourcing using gold judges
behavior (Kazai & Zitouni, WSDM’ 16)
– Correlate crowd behavior with expert behavior
• No shared data or source code
• MmmTurkey: A Crowdsourcing Framework for Deploying
Tasks and Recording Worker Behavior on Mechanical Turk
– Records clicks, scrolls, mouse movements, key presses, copy or
paste actions, and change in window focus (with time stamps)
– Dang, Hutson, & Lease, HCOMP’16
– http://github.com/CuriousG102/turkey
– https://github.com/budang/turkey-lite***
Prior Work
6
Prediction via Behavioral Models
• Two prediction tasks (w/o aggregation)
1. label correctness (classification)
2. worker accuracy (regression)
• Three purely behavior-based models
1. Random Forest with Aggregate Features (RF-AF)
2. Random Forest with Sequential Features (RF-SF)
3. K-means with Sequence Clusters (kmeans-SC)
• See paper for details
– Also a 4th hybrid model using work history as well
• See paper for details
7
RF with Aggregate Features (RF-AF)
• Rzeszotarski & Kittur (2011) use Action features (only),
e.g, task time, on focus time, and raw event counts
• Kazai and Zitouni (2016) include Temporal features
between successive events within a HIT.
• We use both
8
RF with Sequential Features (RF-SF)
• For given task, workers likely to perform
actions in similar order
• Aggregate features don’t capture the order of
events occurring within a HIT
– e.g., a click followed by a scroll, etc.
• Feature templates: we extract all sequences of
length 2k + m, i.e., 2 fixed event sequences of
length k separated by m random events.
– for k = 2, m = 1, {Click, Click, <event>, Click, Scroll}
9
The Labeling Task:
Judging Relevance of Search Results
@mattlease
11
Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments
with T. McDonnell, M. Kutlu, & T. Elsayed
HCOMP 2016
12
Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments
with T. McDonnell, M. Kutlu, & T. Elsayed
HCOMP 2016
Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments
with T. McDonnell, M. Kutlu, & T. Elsayed
HCOMP 2016
13
• Scale up approach from prior HCOMP paper
• Mine rationales to understand disagreement
• But not discussed… worker behavioral data
Crowd vs. Expert: What Can Relevance Judgment
Rationales Teach Us About Assessor Disagreement?
with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed
to appear at ACM SIGIR 2018 (in 4 days)
14
Behavioral Data in this Study
• 3,984 unique HITs (i.e., behavioral traces)
• 2,294 unique document-topic pairs
• 106 unique workers
• 1-5 labels per document (variable)
15
Evaluation: Prediction via Behavior
16
• Cross-validation; different workers in each train/test split
– “Cold start”: must predict for unseen workers
• Two predictions tasks (per HIT, given behavioral data)
1. Classification: is worker’s label correct or not?
2. Regression: what is worker accuracy?
• We define as the true, time-varying worker accuracy as % of the
worker’s last 5 labels which were correct
• Baselines
– Simple baseline: constant prediction – always predict label
correct, accuracy = mean worker accuracy (65.8%)
– Decision Tree (akin to Rzeszotarski & Kittur, UIST’11)
Prediction Results (Behavior only)
17
Method Classification (Accuracy) Regression (MSE)
Baseline: Constant 65.6 5.6
Decision Tree – AF 60.4 6.7
Random Forest – AF 67.9 4.7
Random Forest - SF 68.8 4.8
• Constant baseline beats decision tree
• Aggregate vs. Sequential Features comparable
– Sequential slightly higher classification accuracy
• Notes
– Prediction based only on behavioral traces
– No aggregation; can use with single labeling
– More results in the paper
II. Aggregation via Behavioral Weighting
• Weighted voting based on
1. Predicted label confidence
2. Predicted worker accuracy
• Baselines
– Majority Vote (unweighted): ~64% accuracy
– EM (peer-agreement weighting): ~67%
• Behavior-only weighted aggregation (RF-SF)
– Weighting by predicted worker accuracy: ~69.5%
– Weighted by predicted label confidence: ~72%
18
III. Dynamic Labeling via Behavior
• Can we intelligently decide when to collect more
labels given only observed behavior?
• Markov Decision Process (MDP)
– State is current estimated label quality
• Individual label quality estimated by RF-SF; aggregate label
quality following Dai et al. (2013)
– Decide at each step whether to get another label
• Weigh likely quality improvement vs. cost
• Given target quality parameter, stop if think it’s reached
19
Target quality: 0.7
Selecting the
example to
label next
Dynamic Labeling: Results
20
Discussion
• With strong task design, less need for QC
– i.e., worker filtering & aggregation
• Biggest challenge was small data scale
• Ethical issues of behavioral data collection
– Workplace “surveillance”; oDesk work diary
21
Contributions & Future Work
• Three models for quality prediction via behavior
– Classification/Regression, Aggregation, & Dyn. Labeling
– 1st behavioral data --> aggregation & dynamic labeling
• Shared behavioral data for ~4K HITs
– http://ir.ischool.utexas.edu/webcrowd25k/
• Future Work
– Analyzing behavioral data at greater scale
– Hybrid aggregation (behavior + non-behavior)
– Transfer learning (i.e. application across tasks)
22
Matthew Lease - ml@utexas.edu - @mattlease
Thank You!
slideshare.net/mattlease
Lab: ir.ischool.utexas.edu

More Related Content

What's hot

Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Sc Huang
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial Intelligence
Penn State EdTech Network
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
Jan Aerts
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
Duke Network Analysis Center
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Daniel Roggen
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Anastasiia Kornilova
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
Christoph Trattner
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
Alexandros Karatzoglou
 
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic DataNL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
Suvodeep Mazumdar
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
KamleshKumar394
 

What's hot (13)

Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial Intelligence
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic DataNL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
 

Similar to Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Matthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
Matthew Lease
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
Sampath Jayarathna
 
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
GESIS
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
Chamath Sajeewa
 
Understanding the Impact of the Role Factor in Collaborative Information Retr...
Understanding the Impact of the Role Factor in Collaborative Information Retr...Understanding the Impact of the Role Factor in Collaborative Information Retr...
Understanding the Impact of the Role Factor in Collaborative Information Retr...
UPMC - Sorbonne Universities
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
Pratik Doshi
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
Akin Osman Kazakci
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
Alberto Danese
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...
Michelle Zhou
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 
Systems Modelling and Qualitative Data
Systems Modelling and Qualitative Data Systems Modelling and Qualitative Data
Systems Modelling and Qualitative Data
mikeyearworth
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
Khalid Salama
 
Discovering Common Motifs in Cursor Movement Data
Discovering Common Motifs in Cursor Movement DataDiscovering Common Motifs in Cursor Movement Data
Discovering Common Motifs in Cursor Movement Data
Yandex
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical Features
Domino Data Lab
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
David Zibriczky
 

Similar to Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations (20)

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
 
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
 
Understanding the Impact of the Role Factor in Collaborative Information Retr...
Understanding the Impact of the Role Factor in Collaborative Information Retr...Understanding the Impact of the Role Factor in Collaborative Information Retr...
Understanding the Impact of the Role Factor in Collaborative Information Retr...
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Systems Modelling and Qualitative Data
Systems Modelling and Qualitative Data Systems Modelling and Qualitative Data
Systems Modelling and Qualitative Data
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Discovering Common Motifs in Cursor Movement Data
Discovering Common Motifs in Cursor Movement DataDiscovering Common Motifs in Cursor Movement Data
Discovering Common Motifs in Cursor Movement Data
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical Features
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 

More from Matthew Lease

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
Matthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
Matthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
Matthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Matthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
Matthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Matthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
Matthew Lease
 

More from Matthew Lease (20)

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 

Recently uploaded

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 

Recently uploaded (20)

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 

Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations

  • 1. Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, & Matthew Lease UT Austin -&- Qatar U Slides: slideshare.net/mattlease ml@utexas.edu @mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at 96 universities around the world www.ischools.org What’s an Information School? 2
  • 3. • Behavioral data: what & why? • Prediction Tasks & Models • The Labeling Task: Search Relevance Judging • Data & Evaluation (three scenarios) • Discussion & Future Work Roadmap
  • 4. How to Assess Crowd Work? • Typical two approaches – 1. Compare labels vs. expert’s (e.g., “gold”) – 2. Compare labels vs. peers (e.g., MV, EM) • 3. Compare labels to model predictions – e.g., Ryu & Lease, ASIS&T’11 • 4. Collect & assess worker behavioral data 4
  • 5. Worker Behavioral Data (Analytics) • Could reduce need for experts & redundant work • Could combine with other QC methods • Could address “cold-start” problem – Predict quality from worker’s first label via behavior 5
  • 6. • Instrumenting the crowd: using implicit behavioral measures to predict task performance (Rzeszotarski & Kittur, UIST’11) – Correlate crowd behavior with crowd vs. expert labels – Each worker assigned “pass/fail”; DT predicts via behavior • Quality management in crowdsourcing using gold judges behavior (Kazai & Zitouni, WSDM’ 16) – Correlate crowd behavior with expert behavior • No shared data or source code • MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Mechanical Turk – Records clicks, scrolls, mouse movements, key presses, copy or paste actions, and change in window focus (with time stamps) – Dang, Hutson, & Lease, HCOMP’16 – http://github.com/CuriousG102/turkey – https://github.com/budang/turkey-lite*** Prior Work 6
  • 7. Prediction via Behavioral Models • Two prediction tasks (w/o aggregation) 1. label correctness (classification) 2. worker accuracy (regression) • Three purely behavior-based models 1. Random Forest with Aggregate Features (RF-AF) 2. Random Forest with Sequential Features (RF-SF) 3. K-means with Sequence Clusters (kmeans-SC) • See paper for details – Also a 4th hybrid model using work history as well • See paper for details 7
  • 8. RF with Aggregate Features (RF-AF) • Rzeszotarski & Kittur (2011) use Action features (only), e.g, task time, on focus time, and raw event counts • Kazai and Zitouni (2016) include Temporal features between successive events within a HIT. • We use both 8
  • 9. RF with Sequential Features (RF-SF) • For given task, workers likely to perform actions in similar order • Aggregate features don’t capture the order of events occurring within a HIT – e.g., a click followed by a scroll, etc. • Feature templates: we extract all sequences of length 2k + m, i.e., 2 fixed event sequences of length k separated by m random events. – for k = 2, m = 1, {Click, Click, <event>, Click, Scroll} 9
  • 10. The Labeling Task: Judging Relevance of Search Results @mattlease
  • 11. 11 Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments with T. McDonnell, M. Kutlu, & T. Elsayed HCOMP 2016
  • 12. 12 Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments with T. McDonnell, M. Kutlu, & T. Elsayed HCOMP 2016
  • 13. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments with T. McDonnell, M. Kutlu, & T. Elsayed HCOMP 2016 13
  • 14. • Scale up approach from prior HCOMP paper • Mine rationales to understand disagreement • But not discussed… worker behavioral data Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? with M. Kutlu, T. McDonnell, Y. Barkallah, & T. Elsayed to appear at ACM SIGIR 2018 (in 4 days) 14
  • 15. Behavioral Data in this Study • 3,984 unique HITs (i.e., behavioral traces) • 2,294 unique document-topic pairs • 106 unique workers • 1-5 labels per document (variable) 15
  • 16. Evaluation: Prediction via Behavior 16 • Cross-validation; different workers in each train/test split – “Cold start”: must predict for unseen workers • Two predictions tasks (per HIT, given behavioral data) 1. Classification: is worker’s label correct or not? 2. Regression: what is worker accuracy? • We define as the true, time-varying worker accuracy as % of the worker’s last 5 labels which were correct • Baselines – Simple baseline: constant prediction – always predict label correct, accuracy = mean worker accuracy (65.8%) – Decision Tree (akin to Rzeszotarski & Kittur, UIST’11)
  • 17. Prediction Results (Behavior only) 17 Method Classification (Accuracy) Regression (MSE) Baseline: Constant 65.6 5.6 Decision Tree – AF 60.4 6.7 Random Forest – AF 67.9 4.7 Random Forest - SF 68.8 4.8 • Constant baseline beats decision tree • Aggregate vs. Sequential Features comparable – Sequential slightly higher classification accuracy • Notes – Prediction based only on behavioral traces – No aggregation; can use with single labeling – More results in the paper
  • 18. II. Aggregation via Behavioral Weighting • Weighted voting based on 1. Predicted label confidence 2. Predicted worker accuracy • Baselines – Majority Vote (unweighted): ~64% accuracy – EM (peer-agreement weighting): ~67% • Behavior-only weighted aggregation (RF-SF) – Weighting by predicted worker accuracy: ~69.5% – Weighted by predicted label confidence: ~72% 18
  • 19. III. Dynamic Labeling via Behavior • Can we intelligently decide when to collect more labels given only observed behavior? • Markov Decision Process (MDP) – State is current estimated label quality • Individual label quality estimated by RF-SF; aggregate label quality following Dai et al. (2013) – Decide at each step whether to get another label • Weigh likely quality improvement vs. cost • Given target quality parameter, stop if think it’s reached 19
  • 20. Target quality: 0.7 Selecting the example to label next Dynamic Labeling: Results 20
  • 21. Discussion • With strong task design, less need for QC – i.e., worker filtering & aggregation • Biggest challenge was small data scale • Ethical issues of behavioral data collection – Workplace “surveillance”; oDesk work diary 21
  • 22. Contributions & Future Work • Three models for quality prediction via behavior – Classification/Regression, Aggregation, & Dyn. Labeling – 1st behavioral data --> aggregation & dynamic labeling • Shared behavioral data for ~4K HITs – http://ir.ischool.utexas.edu/webcrowd25k/ • Future Work – Analyzing behavioral data at greater scale – Hybrid aggregation (behavior + non-behavior) – Transfer learning (i.e. application across tasks) 22
  • 23. Matthew Lease - ml@utexas.edu - @mattlease Thank You! slideshare.net/mattlease Lab: ir.ischool.utexas.edu