SlideShare a Scribd company logo
Using Generative Augmentation
to improve ‘Learning
from Crowds’
Neetha Sherra
San Jose State University
CMPE 255-Introduction to Data Mining
Introduction
• A typical classification problem is supervised
• Example, the commonly referred to Iris dataset
• The two common ways to solve this problem
- Feed the data to an unsupervised ML model
- Crowdsource the labels
Crowdsourcing: Definition, pros and cons
• Crowdsourcing in general is a process wherein a dispersed group of
participants provide a service either as volunteers or for payment
• Advantages
- Cost-effective
- Time-saving
• Disadvantages
- Sparsity
- Low-quality
• The disadvantages can be addressed but nullifies the advantage of using
crowdsourcing (catch-22)
How does CrowdInG help?
• CrowdInG-Crowdsourced data through Informative Generative augmentation
uses generative AI to perform data augmentation on missing labels
• Its main goal is the accuracy of labels
- reflect the ground-truth
- true to the distribution of crowdsourced labels
• It is based on Generative Adversarial Networks (GANs)
- Generator
- Discriminator
CrowdInG framework
CrowdInG framework continued ..
• S = {𝑥𝑛, 𝑦𝑛}
- where n -> [1, N]
− 𝑥𝑛: feature vector of instance n
- 𝑦𝑛: annotation vector of instance n from R annotators (with missing values)
- 𝑒𝑟: feature vector of the r-th annotator (when available)
- 𝑧𝑛: unobserved ground-truth label
- Goal: a classifier that learns directly from S
• Generative module
- Classifier given instance x outputs its predicted label
- Generator takes the classifier output + feature vector + annotator vector
• Discriminative module
- Discriminator determines whether the annotation is authentic or generated
- Auxiliary network penalizes the generative network based on the classified + generated label
• The two modules are involved in a minimax game
Training and model optimization
• Entropy-based annotation selection
- Training bias because of annotation sparsity
- Equal sample sizes for original and generated annotations
• Two-step update for the generative module
- Generator and classifier are coupled
- Strong negative correlation between the entropy of a classifier’s output and its
accuracy
- Instances with low classification entropy are used to update the generator
- Updated generator is then used to update the rest of the instances for the
classifier
Experiments
• For evaluation three real-
world datasets were
employed with a subset of
low-quality annotators was
selected.
• The results of CrowdInG
were compared with a
state-of-the-art baselines
with the same classifier
design
• Outperforms models
designed for complex
confusions
Experiments
continued…
• To study the utility of
augmented annotations
and investigate
performance, observed
annotations were gradually
removed
• While there was a large
amount of sparsity on
removal of annotations,
CrowdInG still performs
consistently well
Conclusion
• Data sparsity is a huge challenge
• Demonstrates its effectiveness and provides a potential way forward
in the area of low-budget crowdsourcing
• Future potential
- Annotator education based on annotator-specific confusions
- Task assignment based on instance-specific confusions
References
Reference paper
https://arxiv.org/pdf/2107.10449.pdf
Title slide image source
https://www.gep.com/blog/mind/crowdsourcing-marketing
Thank You

More Related Content

Similar to CrowdInG_learning_from_crowds.pptx

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
Si Krishan
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
Prashant Bhatmule
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
Mohammed El Rafie Tarabay
 
Ai in finance
Ai in financeAi in finance
Ai in finance
QuantUniversity
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Egyptian Engineers Association
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE ML
Dev Raj Gautam
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
Liangjie Hong
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
MapR Technologies
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
Nitesh Kumar
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
FEG
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
Matthew Evans
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
Databricks
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptx
lahelex741
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
QuantUniversity
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
Francesca Lazzeri, PhD
 

Similar to CrowdInG_learning_from_crowds.pptx (20)

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE ML
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptx
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

CrowdInG_learning_from_crowds.pptx

  • 1. Using Generative Augmentation to improve ‘Learning from Crowds’ Neetha Sherra San Jose State University CMPE 255-Introduction to Data Mining
  • 2. Introduction • A typical classification problem is supervised • Example, the commonly referred to Iris dataset • The two common ways to solve this problem - Feed the data to an unsupervised ML model - Crowdsource the labels
  • 3. Crowdsourcing: Definition, pros and cons • Crowdsourcing in general is a process wherein a dispersed group of participants provide a service either as volunteers or for payment • Advantages - Cost-effective - Time-saving • Disadvantages - Sparsity - Low-quality • The disadvantages can be addressed but nullifies the advantage of using crowdsourcing (catch-22)
  • 4. How does CrowdInG help? • CrowdInG-Crowdsourced data through Informative Generative augmentation uses generative AI to perform data augmentation on missing labels • Its main goal is the accuracy of labels - reflect the ground-truth - true to the distribution of crowdsourced labels • It is based on Generative Adversarial Networks (GANs) - Generator - Discriminator
  • 6. CrowdInG framework continued .. • S = {𝑥𝑛, 𝑦𝑛} - where n -> [1, N] − 𝑥𝑛: feature vector of instance n - 𝑦𝑛: annotation vector of instance n from R annotators (with missing values) - 𝑒𝑟: feature vector of the r-th annotator (when available) - 𝑧𝑛: unobserved ground-truth label - Goal: a classifier that learns directly from S • Generative module - Classifier given instance x outputs its predicted label - Generator takes the classifier output + feature vector + annotator vector • Discriminative module - Discriminator determines whether the annotation is authentic or generated - Auxiliary network penalizes the generative network based on the classified + generated label • The two modules are involved in a minimax game
  • 7. Training and model optimization • Entropy-based annotation selection - Training bias because of annotation sparsity - Equal sample sizes for original and generated annotations • Two-step update for the generative module - Generator and classifier are coupled - Strong negative correlation between the entropy of a classifier’s output and its accuracy - Instances with low classification entropy are used to update the generator - Updated generator is then used to update the rest of the instances for the classifier
  • 8. Experiments • For evaluation three real- world datasets were employed with a subset of low-quality annotators was selected. • The results of CrowdInG were compared with a state-of-the-art baselines with the same classifier design • Outperforms models designed for complex confusions
  • 9. Experiments continued… • To study the utility of augmented annotations and investigate performance, observed annotations were gradually removed • While there was a large amount of sparsity on removal of annotations, CrowdInG still performs consistently well
  • 10. Conclusion • Data sparsity is a huge challenge • Demonstrates its effectiveness and provides a potential way forward in the area of low-budget crowdsourcing • Future potential - Annotator education based on annotator-specific confusions - Task assignment based on instance-specific confusions
  • 11. References Reference paper https://arxiv.org/pdf/2107.10449.pdf Title slide image source https://www.gep.com/blog/mind/crowdsourcing-marketing