2. How to learn with
non-reliable
labels?
Morgan Lefranc - Ridge-i
2
3. What this presentation is NOT
● Exhaustive
● Definitive
● Detailed
● Perfectly accurate
● Beautiful
● Well prepared
3
4. Table of Contents
1. Incomplete supervision
a. Active learning
b. Semi-supervised learning
2. Inexact supervision
a. Class-Activation Map
b. Multiple Instance learning
3. Inaccurate supervision
a. Crowd-sourcing techniques
b. Confident learning
4
5. Incomplete supervision
A small subset of the data contains labels, while the remaining majority if
the data is unlabeled
5
6. How to deal with incomplete
supervision?
Human supervision available
ACTIVE LEARNING
● A human oracle can be queried to
request annotation for specific
samples
● Need to find good samples so that
good performance can be achieved
with minimal amount of data
Human supervision non available
SEMI-SUPERVISED LEARNING
● Exploit the partial labels to explain
the unlabeled ones
6
7. Active learning: how to select samples?
Uncertainty sampling Query by committee
7
Images by presenter
8. Examples of semi-supervised learning
Low-density based Disagreement-based methods (e.g. co-training)
8
Images by presenter
9. Inexact supervision
Each data sample has a label, but the supervision is not as fine-grained as
required for the task
9
11. CAM for Object Detection
If we know an image contains an object, we can use the CAM of this class to
propose bounding boxes
11
12. Multiple Instance Learning (MIL)
● Each bag of instances is annotated. The goal is to predict individual
instances.
● Individual instance predictions inside a bag are aggregated and
compared with the bag label. Errors are back-propagated. 12
13. Example of MIL for Semantic
Segmentation from BB labels
13
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, Jiajun Wu, Yibiao Zhao, Jun-Yan
Zhu, Siwei Luo, and Zhuowen Tu
15. Crowd-sourcing techniques
● Can provide a lot of low-quality labels. What to do with them?
● Most common technique: ask several workers to carry the same annotation task,
average the results.
● More advanced: Track the performance of each worker, use Bayesian inference
techniques to keep an estimate of their reliability, and give reliable workers more
weight on the decision. → similar to active learning / semi-supervised learning
workflows.
15
16. Confident learning - cleanlab
Objective: Find and remove noisy labels in a
dataset.
1. Train a model on a noisy labeled dataset.
2. Run the model on ground-truth labels and
get the prediction confidence.
3. Count the number of times where the
confidence for an incorrect class is higher
than a certain threshold.
4. Use this count as a way to estimate the noise
in the labels, rank the less reliable ones and
prune them out.
16
17. CleanLab output: noisy labels from ImageNet (more than 100,000 in total).
Blue: multi-label images, green: ontological issue, red: label error.
17
18. References
1. A brief introduction to weakly supervised learning
2. How to Use Inaccurate Data for Machine Learning with Weakly Supervised Learning
3. Confident Learning: Estimating Uncertainty in Dataset Labels
18