CrowdInG_learning_from_crowds.pptx

Using Generative Augmentation
to improve ‘Learning
from Crowds’
Neetha Sherra
San Jose State University
CMPE 255-Introduction to Data Mining

Introduction
• A typical classification problem is supervised
• Example, the commonly referred to Iris dataset
• The two common ways to solve this problem
- Feed the data to an unsupervised ML model
- Crowdsource the labels

Crowdsourcing: Definition, pros and cons
• Crowdsourcing in general is a process wherein a dispersed group of
participants provide a service either as volunteers or for payment
• Advantages
- Cost-effective
- Time-saving
• Disadvantages
- Sparsity
- Low-quality
• The disadvantages can be addressed but nullifies the advantage of using
crowdsourcing (catch-22)

How does CrowdInG help?
• CrowdInG-Crowdsourced data through Informative Generative augmentation
uses generative AI to perform data augmentation on missing labels
• Its main goal is the accuracy of labels
- reflect the ground-truth
- true to the distribution of crowdsourced labels
• It is based on Generative Adversarial Networks (GANs)
- Generator
- Discriminator

CrowdInG framework continued ..
• S = {𝑥𝑛, 𝑦𝑛}
- where n -> [1, N]
− 𝑥𝑛: feature vector of instance n
- 𝑦𝑛: annotation vector of instance n from R annotators (with missing values)
- 𝑒𝑟: feature vector of the r-th annotator (when available)
- 𝑧𝑛: unobserved ground-truth label
- Goal: a classifier that learns directly from S
• Generative module
- Classifier given instance x outputs its predicted label
- Generator takes the classifier output + feature vector + annotator vector
• Discriminative module
- Discriminator determines whether the annotation is authentic or generated
- Auxiliary network penalizes the generative network based on the classified + generated label
• The two modules are involved in a minimax game

Training and model optimization
• Entropy-based annotation selection
- Training bias because of annotation sparsity
- Equal sample sizes for original and generated annotations
• Two-step update for the generative module
- Generator and classifier are coupled
- Strong negative correlation between the entropy of a classifier’s output and its
accuracy
- Instances with low classification entropy are used to update the generator
- Updated generator is then used to update the rest of the instances for the
classifier

Experiments
• For evaluation three real-
world datasets were
employed with a subset of
low-quality annotators was
selected.
• The results of CrowdInG
were compared with a
state-of-the-art baselines
with the same classifier
design
• Outperforms models
designed for complex
confusions

Experiments
continued…
• To study the utility of
augmented annotations
and investigate
performance, observed
annotations were gradually
removed
• While there was a large
amount of sparsity on
removal of annotations,
CrowdInG still performs
consistently well

Conclusion
• Data sparsity is a huge challenge
• Demonstrates its effectiveness and provides a potential way forward
in the area of low-budget crowdsourcing
• Future potential
- Annotator education based on annotator-specific confusions
- Task assignment based on instance-specific confusions

References
Reference paper
https://arxiv.org/pdf/2107.10449.pdf
Title slide image source
https://www.gep.com/blog/mind/crowdsourcing-marketing

CrowdInG_learning_from_crowds.pptx

Recommended

Recommended

More Related Content

Similar to CrowdInG_learning_from_crowds.pptx

Similar to CrowdInG_learning_from_crowds.pptx (20)

Recently uploaded

Recently uploaded (20)

CrowdInG_learning_from_crowds.pptx