SlideShare a Scribd company logo
1 of 13
On Quality Control and Machine Learning
            in Crowdsourcing




                   Matt Lease
             School of Information
           University of Texas at Austin
             ml@ischool.utexas.edu
                   @mattlease
Quality Control
• Many factors matter
  – guidelines, experimental design, human factors,
    automation, …
• Only as strong as weakest link
  – automation is not a silver bullet
• Errors are not just due to lazy/stupid workers
  – Even in carefully designed and managed
    annotation projects, uncertain cases encountered

                                                       2
Human Factors (HF)
•   Questionnaire / Survey Design
•   Interface / Interaction Design
•   Incentives
•   Human Relations (HR): recruitment & retention
•   Long-term Commitment
    – rapport with co-workers
    – buy-in to organizational mission & value of work
    – opportunities for advancement in organization
• Oversight / Management / Organization
• Communication
                                                         3
HF Challenges & Consequences
• Not part of typical CS curriculum or expertise
   – crowdsourcing disrupts prior area boundaries
• NLP, IR, ML people traditionally don’t do HCI
   – now many of us dealing with such issues
• Consequences
   –   Errors from poor HF
   –   Stumbling into known problems, recreating solutions
   –   May see problems through limited vantage point
   –   May over-rely on automation
• Great opportunities for HCI collaboration

                                                             4
Minority Voice & Diversity
• Opportunity: more diversity than “experts”
• Risk: false reinforcement of majority view
  when minority is ignored, lost, or eliminated
• Questions
  – How to recognize when majority is wrong?
  – How to recognize alternative or better truths?
  – Is QC systematically eliminating diversity?
  – How diverse is the crowd really?

                                                     5
Automation
• Examples
  – Task Routing / Worker Selection
  – Adaptive Plurality, Decomposition
  – Post-hoc: Calibration, Filtering & Aggregation
• Separation of concerns / middleware
  – Users specify their task, and system handles QC
  – Many do not have interest, time, skill, or risk tolerance
    to manage low-level QC on their own
  – Critical to widespread/enterprise adoption
  – Accelerate field progress
     • divide problem space for different groups to work on
                                                              6
Automation: Questions
•   Who are the workers?
•   What is the labor model?
•   What are affordances of the platform?
•   How does that drive subsequent setup?
•   Appropriate inner-annotator agreement
    measures for crowdwork?



                                            7
Lessons from Traditional Annotation
•   Need clear, detailed guidelines
•   Cannot predict all cases in advance
•   Guidelines evolve during annotation
•   Humans not merely better visual, audio sensors
    – e.g. imprecise directions & unforeseen examples
• Crowdsourcing Questions
    – How to handle examples for which current guidelines
      are ambiguous, unclear, or insufficient?
    – What role do annotators play?
    – How to facilitate interaction?
                                                            8
Worker Organization
• How might we organize workers for effective QC?
• Do workers participate in high level discussions
  (telecommuters) or act like automata (HPU)?
• What organizational patterns might be used
  – e.g. find-verify, fix-fix-verify, qualify-work
• How do different organizational patterns interact
  with automation and other QC factors?



                                                      9
Impact on Machine Learning: More
•   Labeled data
•   Uncertain data
•   Diverse data
•   Specific data
•   Ongoing data
•   Rapid data
•   Hybrid systems
•   On-demand evaluation
•   Datasets & Benchmarks
•   Tasks
                               10
Open Questions
• How do cheap, plentiful , rapid labels alter how we utilize
  supervisied vs. semi-supervised vs. unsupervised methods?
   – Revist task-specific learning curves
• Mask uncertainty via QC or model, propagate, and expose?
• How do we handle noise in active learning?
• How to best utilize a 24/7 global crowd for lifetime,
  continuous, never-ending learning systems?
   – Sample size vs. adaptation
• Can we develop a more formal, computational
  understanding of Wisdom of Crowds?
   – diversity, independence, decentralization, and aggregation
• Can we better connect consensus algorithms with more
  general feature-based and ensemble models?
                                                                  11
Other Issues
• Hybrid systems match human-level competence
   – Achievable now at certain time/cost tradeoff, which can be
     navigated as function of context and need
• Diverse labeling particularly valuable when subjective
   – Traditional in-house annotators not diverse & few
• A middle way between traditional annotation and
  automated proxy metrics
   – e.g. translation quality & BLEU
   – More rapid than traditional annotation, more accurate
     than automated metrics
• Less re-use has the risk of less comparable evaluation
   – Enduring value of community evaluations like TREC

                                                             12
Thank You!
                 ir.ischool.utexas.edu/crowd
• Students
  –   Catherine Grady (iSchool)       Matt Lease
  –   Hyunjoon Jung (ECE)             ml@ischool.utexas.edu
  –   Jorn Klinger (Linguistics)       @mattlease
  –   Adriana Kovashka (CS)
  –   Abhimanu Kumar (CS)
  –   Di Liu (iSchool)
  –   Hohyon Ryu (iSchool)
  –   William Tang (CS)
  –   Stephen Wolfson (iSchool)
• Omar Alonso, Microsoft Bing
• Support
  – John P. Commons                                     13

More Related Content

Similar to On Quality Control and Machine Learning in Crowdsourcing

What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014Shane Dawson
 
MOOCS@Work Working Group Session 2
MOOCS@Work Working Group Session 2MOOCS@Work Working Group Session 2
MOOCS@Work Working Group Session 2LearningCafe
 
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxSdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxkimlyman
 
RACE in Instructional Technology
RACE in Instructional TechnologyRACE in Instructional Technology
RACE in Instructional TechnologyDionne Curbeam
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveoralonso
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student LearningMarie Bienkowski
 
1©McGraw-Hill Education. All rights reserved. Authorized o
1©McGraw-Hill Education. All rights reserved. Authorized o1©McGraw-Hill Education. All rights reserved. Authorized o
1©McGraw-Hill Education. All rights reserved. Authorized oTatianaMajor22
 
HR Tech and the Employee Experience
HR Tech and the Employee ExperienceHR Tech and the Employee Experience
HR Tech and the Employee ExperienceTom Haak
 
Focus Group Outputs - External Talent Pools
Focus Group Outputs - External Talent PoolsFocus Group Outputs - External Talent Pools
Focus Group Outputs - External Talent PoolsEmma Mirrington
 

Similar to On Quality Control and Machine Learning in Crowdsourcing (20)

What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014
 
MOOCS@Work Working Group Session 2
MOOCS@Work Working Group Session 2MOOCS@Work Working Group Session 2
MOOCS@Work Working Group Session 2
 
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxSdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
 
KM 101
KM 101KM 101
KM 101
 
RACE in Instructional Technology
RACE in Instructional TechnologyRACE in Instructional Technology
RACE in Instructional Technology
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
Module 1
Module 1Module 1
Module 1
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student Learning
 
1©McGraw-Hill Education. All rights reserved. Authorized o
1©McGraw-Hill Education. All rights reserved. Authorized o1©McGraw-Hill Education. All rights reserved. Authorized o
1©McGraw-Hill Education. All rights reserved. Authorized o
 
HR Tech and the Employee Experience
HR Tech and the Employee ExperienceHR Tech and the Employee Experience
HR Tech and the Employee Experience
 
Competency gaps for Professional Development
Competency gaps for Professional DevelopmentCompetency gaps for Professional Development
Competency gaps for Professional Development
 
A Framework for Health IT Evaluation
A Framework for Health IT EvaluationA Framework for Health IT Evaluation
A Framework for Health IT Evaluation
 
Focus Group Outputs - External Talent Pools
Focus Group Outputs - External Talent PoolsFocus Group Outputs - External Talent Pools
Focus Group Outputs - External Talent Pools
 
U mpres
U mpresU mpres
U mpres
 
Tammi Sinha, Arnhem June 2014, Lean Six Sigma for Higher Education
Tammi Sinha, Arnhem June 2014, Lean Six Sigma for Higher Education Tammi Sinha, Arnhem June 2014, Lean Six Sigma for Higher Education
Tammi Sinha, Arnhem June 2014, Lean Six Sigma for Higher Education
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 

Recently uploaded

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Recently uploaded (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

On Quality Control and Machine Learning in Crowdsourcing

  • 1. On Quality Control and Machine Learning in Crowdsourcing Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease
  • 2. Quality Control • Many factors matter – guidelines, experimental design, human factors, automation, … • Only as strong as weakest link – automation is not a silver bullet • Errors are not just due to lazy/stupid workers – Even in carefully designed and managed annotation projects, uncertain cases encountered 2
  • 3. Human Factors (HF) • Questionnaire / Survey Design • Interface / Interaction Design • Incentives • Human Relations (HR): recruitment & retention • Long-term Commitment – rapport with co-workers – buy-in to organizational mission & value of work – opportunities for advancement in organization • Oversight / Management / Organization • Communication 3
  • 4. HF Challenges & Consequences • Not part of typical CS curriculum or expertise – crowdsourcing disrupts prior area boundaries • NLP, IR, ML people traditionally don’t do HCI – now many of us dealing with such issues • Consequences – Errors from poor HF – Stumbling into known problems, recreating solutions – May see problems through limited vantage point – May over-rely on automation • Great opportunities for HCI collaboration 4
  • 5. Minority Voice & Diversity • Opportunity: more diversity than “experts” • Risk: false reinforcement of majority view when minority is ignored, lost, or eliminated • Questions – How to recognize when majority is wrong? – How to recognize alternative or better truths? – Is QC systematically eliminating diversity? – How diverse is the crowd really? 5
  • 6. Automation • Examples – Task Routing / Worker Selection – Adaptive Plurality, Decomposition – Post-hoc: Calibration, Filtering & Aggregation • Separation of concerns / middleware – Users specify their task, and system handles QC – Many do not have interest, time, skill, or risk tolerance to manage low-level QC on their own – Critical to widespread/enterprise adoption – Accelerate field progress • divide problem space for different groups to work on 6
  • 7. Automation: Questions • Who are the workers? • What is the labor model? • What are affordances of the platform? • How does that drive subsequent setup? • Appropriate inner-annotator agreement measures for crowdwork? 7
  • 8. Lessons from Traditional Annotation • Need clear, detailed guidelines • Cannot predict all cases in advance • Guidelines evolve during annotation • Humans not merely better visual, audio sensors – e.g. imprecise directions & unforeseen examples • Crowdsourcing Questions – How to handle examples for which current guidelines are ambiguous, unclear, or insufficient? – What role do annotators play? – How to facilitate interaction? 8
  • 9. Worker Organization • How might we organize workers for effective QC? • Do workers participate in high level discussions (telecommuters) or act like automata (HPU)? • What organizational patterns might be used – e.g. find-verify, fix-fix-verify, qualify-work • How do different organizational patterns interact with automation and other QC factors? 9
  • 10. Impact on Machine Learning: More • Labeled data • Uncertain data • Diverse data • Specific data • Ongoing data • Rapid data • Hybrid systems • On-demand evaluation • Datasets & Benchmarks • Tasks 10
  • 11. Open Questions • How do cheap, plentiful , rapid labels alter how we utilize supervisied vs. semi-supervised vs. unsupervised methods? – Revist task-specific learning curves • Mask uncertainty via QC or model, propagate, and expose? • How do we handle noise in active learning? • How to best utilize a 24/7 global crowd for lifetime, continuous, never-ending learning systems? – Sample size vs. adaptation • Can we develop a more formal, computational understanding of Wisdom of Crowds? – diversity, independence, decentralization, and aggregation • Can we better connect consensus algorithms with more general feature-based and ensemble models? 11
  • 12. Other Issues • Hybrid systems match human-level competence – Achievable now at certain time/cost tradeoff, which can be navigated as function of context and need • Diverse labeling particularly valuable when subjective – Traditional in-house annotators not diverse & few • A middle way between traditional annotation and automated proxy metrics – e.g. translation quality & BLEU – More rapid than traditional annotation, more accurate than automated metrics • Less re-use has the risk of less comparable evaluation – Enduring value of community evaluations like TREC 12
  • 13. Thank You! ir.ischool.utexas.edu/crowd • Students – Catherine Grady (iSchool) Matt Lease – Hyunjoon Jung (ECE) ml@ischool.utexas.edu – Jorn Klinger (Linguistics) @mattlease – Adriana Kovashka (CS) – Abhimanu Kumar (CS) – Di Liu (iSchool) – Hohyon Ryu (iSchool) – William Tang (CS) – Stephen Wolfson (iSchool) • Omar Alonso, Microsoft Bing • Support – John P. Commons 13