SlideShare a Scribd company logo
1 of 12
Using Generative Augmentation
to improve ‘Learning
from Crowds’
Neetha Sherra
San Jose State University
CMPE 255-Introduction to Data Mining
Introduction
• A typical classification problem is supervised
• Example, the commonly referred to Iris dataset
• The two common ways to solve this problem
- Feed the data to an unsupervised ML model
- Crowdsource the labels
Crowdsourcing: Definition, pros and cons
• Crowdsourcing in general is a process wherein a dispersed group of
participants provide a service either as volunteers or for payment
• Advantages
- Cost-effective
- Time-saving
• Disadvantages
- Sparsity
- Low-quality
• The disadvantages can be addressed but nullifies the advantage of using
crowdsourcing (catch-22)
How does CrowdInG help?
• CrowdInG-Crowdsourced data through Informative Generative augmentation
uses generative AI to perform data augmentation on missing labels
• Its main goal is the accuracy of labels
- reflect the ground-truth
- true to the distribution of crowdsourced labels
• It is based on Generative Adversarial Networks (GANs)
- Generator
- Discriminator
CrowdInG framework
CrowdInG framework continued ..
• S = {𝑥𝑛, 𝑦𝑛}
- where n -> [1, N]
− 𝑥𝑛: feature vector of instance n
- 𝑦𝑛: annotation vector of instance n from R annotators (with missing values)
- 𝑒𝑟: feature vector of the r-th annotator (when available)
- 𝑧𝑛: unobserved ground-truth label
- Goal: a classifier that learns directly from S
• Generative module
- Classifier given instance x outputs its predicted label
- Generator takes the classifier output + feature vector + annotator vector
• Discriminative module
- Discriminator determines whether the annotation is authentic or generated
- Auxiliary network penalizes the generative network based on the classified + generated label
• The two modules are involved in a minimax game
Training and model optimization
• Entropy-based annotation selection
- Training bias because of annotation sparsity
- Equal sample sizes for original and generated annotations
• Two-step update for the generative module
- Generator and classifier are coupled
- Strong negative correlation between the entropy of a classifier’s output and its
accuracy
- Instances with low classification entropy are used to update the generator
- Updated generator is then used to update the rest of the instances for the
classifier
Experiments
• For evaluation three real-
world datasets were
employed with a subset of
low-quality annotators was
selected.
• The results of CrowdInG
were compared with a
state-of-the-art baselines
with the same classifier
design
• Outperforms models
designed for complex
confusions
Experiments
continued…
• To study the utility of
augmented annotations
and investigate
performance, observed
annotations were gradually
removed
• While there was a large
amount of sparsity on
removal of annotations,
CrowdInG still performs
consistently well
Conclusion
• Data sparsity is a huge challenge
• Demonstrates its effectiveness and provides a potential way forward
in the area of low-budget crowdsourcing
• Future potential
- Annotator education based on annotator-specific confusions
- Task assignment based on instance-specific confusions
References
Reference paper
https://arxiv.org/pdf/2107.10449.pdf
Title slide image source
https://www.gep.com/blog/mind/crowdsourcing-marketing
Thank You

More Related Content

Similar to CrowdInG_learning_from_crowds.pptx

لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Egyptian Engineers Association
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
Liangjie Hong
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptx
lahelex741
 

Similar to CrowdInG_learning_from_crowds.pptx (20)

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE ML
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptx
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 

Recently uploaded

Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Recently uploaded (20)

DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

CrowdInG_learning_from_crowds.pptx

  • 1. Using Generative Augmentation to improve ‘Learning from Crowds’ Neetha Sherra San Jose State University CMPE 255-Introduction to Data Mining
  • 2. Introduction • A typical classification problem is supervised • Example, the commonly referred to Iris dataset • The two common ways to solve this problem - Feed the data to an unsupervised ML model - Crowdsource the labels
  • 3. Crowdsourcing: Definition, pros and cons • Crowdsourcing in general is a process wherein a dispersed group of participants provide a service either as volunteers or for payment • Advantages - Cost-effective - Time-saving • Disadvantages - Sparsity - Low-quality • The disadvantages can be addressed but nullifies the advantage of using crowdsourcing (catch-22)
  • 4. How does CrowdInG help? • CrowdInG-Crowdsourced data through Informative Generative augmentation uses generative AI to perform data augmentation on missing labels • Its main goal is the accuracy of labels - reflect the ground-truth - true to the distribution of crowdsourced labels • It is based on Generative Adversarial Networks (GANs) - Generator - Discriminator
  • 6. CrowdInG framework continued .. • S = {𝑥𝑛, 𝑦𝑛} - where n -> [1, N] − 𝑥𝑛: feature vector of instance n - 𝑦𝑛: annotation vector of instance n from R annotators (with missing values) - 𝑒𝑟: feature vector of the r-th annotator (when available) - 𝑧𝑛: unobserved ground-truth label - Goal: a classifier that learns directly from S • Generative module - Classifier given instance x outputs its predicted label - Generator takes the classifier output + feature vector + annotator vector • Discriminative module - Discriminator determines whether the annotation is authentic or generated - Auxiliary network penalizes the generative network based on the classified + generated label • The two modules are involved in a minimax game
  • 7. Training and model optimization • Entropy-based annotation selection - Training bias because of annotation sparsity - Equal sample sizes for original and generated annotations • Two-step update for the generative module - Generator and classifier are coupled - Strong negative correlation between the entropy of a classifier’s output and its accuracy - Instances with low classification entropy are used to update the generator - Updated generator is then used to update the rest of the instances for the classifier
  • 8. Experiments • For evaluation three real- world datasets were employed with a subset of low-quality annotators was selected. • The results of CrowdInG were compared with a state-of-the-art baselines with the same classifier design • Outperforms models designed for complex confusions
  • 9. Experiments continued… • To study the utility of augmented annotations and investigate performance, observed annotations were gradually removed • While there was a large amount of sparsity on removal of annotations, CrowdInG still performs consistently well
  • 10. Conclusion • Data sparsity is a huge challenge • Demonstrates its effectiveness and provides a potential way forward in the area of low-budget crowdsourcing • Future potential - Annotator education based on annotator-specific confusions - Task assignment based on instance-specific confusions
  • 11. References Reference paper https://arxiv.org/pdf/2107.10449.pdf Title slide image source https://www.gep.com/blog/mind/crowdsourcing-marketing