How to learn with non-reliable labels?

•

0 likes•63 views

Ridge-i, Inc.

This is a presentation by our Engineer Morgan Lefranc about handling non-reliable labels in machine learning.

Technology

Study Meeting Presentation: 
 
How to learn with non-reliable labels?  
Author: Morgan LEFRANC 
 
Date: 2021/07/28

How to learn with
non-reliable
labels?
Morgan Lefranc - Ridge-i
2

What this presentation is NOT
● Exhaustive
● Deﬁnitive
● Detailed
● Perfectly accurate
● Beautiful
● Well prepared
3

Table of Contents
1. Incomplete supervision
a. Active learning
b. Semi-supervised learning
2. Inexact supervision
a. Class-Activation Map
b. Multiple Instance learning
3. Inaccurate supervision
a. Crowd-sourcing techniques
b. Conﬁdent learning
4

Incomplete supervision
A small subset of the data contains labels, while the remaining majority if
the data is unlabeled
5

How to deal with incomplete
supervision?
Human supervision available
ACTIVE LEARNING
● A human oracle can be queried to
request annotation for speciﬁc
samples
● Need to ﬁnd good samples so that
good performance can be achieved
with minimal amount of data
Human supervision non available
SEMI-SUPERVISED LEARNING
● Exploit the partial labels to explain
the unlabeled ones
6

Active learning: how to select samples?
Uncertainty sampling Query by committee
7
Images by presenter

Examples of semi-supervised learning
Low-density based Disagreement-based methods (e.g. co-training)
8
Images by presenter

Inexact supervision
Each data sample has a label, but the supervision is not as ﬁne-grained as
required for the task
9

CAM for Object Detection
If we know an image contains an object, we can use the CAM of this class to
propose bounding boxes
11

Multiple Instance Learning (MIL)
● Each bag of instances is annotated. The goal is to predict individual
instances.
● Individual instance predictions inside a bag are aggregated and
compared with the bag label. Errors are back-propagated. 12

Example of MIL for Semantic
Segmentation from BB labels
13
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, Jiajun Wu, Yibiao Zhao, Jun-Yan
Zhu, Siwei Luo, and Zhuowen Tu

Inaccurate supervision
The supervision information is not always the ground truth
14

Crowd-sourcing techniques
● Can provide a lot of low-quality labels. What to do with them?
● Most common technique: ask several workers to carry the same annotation task,
average the results.
● More advanced: Track the performance of each worker, use Bayesian inference
techniques to keep an estimate of their reliability, and give reliable workers more
weight on the decision. → similar to active learning / semi-supervised learning
workﬂows.
15

Conﬁdent learning - cleanlab
Objective: Find and remove noisy labels in a
dataset.
1. Train a model on a noisy labeled dataset.
2. Run the model on ground-truth labels and
get the prediction conﬁdence.
3. Count the number of times where the
conﬁdence for an incorrect class is higher
than a certain threshold.
4. Use this count as a way to estimate the noise
in the labels, rank the less reliable ones and
prune them out.
16

CleanLab output: noisy labels from ImageNet (more than 100,000 in total).
Blue: multi-label images, green: ontological issue, red: label error.
17

References
1. A brief introduction to weakly supervised learning
2. How to Use Inaccurate Data for Machine Learning with Weakly Supervised Learning
3. Conﬁdent Learning: Estimating Uncertainty in Dataset Labels
18

Appendix: Conﬁdent learning equations
19

Similar to How to learn with non-reliable labels?

BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSHuman Capital Media

Planning & Conducting a Research StudyElicia Stewart

How to Correctly Use Experimentation in PM by Google PMProduct School

When deep learners change their mind learning dynamics for active learningDevansh16

Endsem AI merged.pdfShivamMishra603376

Review On In-Context Leaning.pptxwesleyshih4

5 learning edited 2012.pptHenokGetachew15

Knowledge Management in the Cloud: Benefits and RisksEditor IJCATR

An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...Editor IJCATR

An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...Editor IJCATR

Sources & methods of data collectionpss_prashant

Conducting, analyzing and reporting in depth interviews slideshare 0213 dmfDavid Filiberto

Non sampling errorwahengbam bigyananda

Rapid Assessment Process (1st Edition)James Beebe

Do Screencasts Really Work? Assessing Student Learning through Instructional ...juliepia

The Scientific Method of Experimentation by Google PMProduct School

Crowd Teaching with Imperfect Labelscollwe

More Than UsabilityRazan Sadeq

Don't Fear the UserJacob Geib-Rosch

Similar to How to learn with non-reliable labels? (20)

BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS

Planning & Conducting a Research Study

How to Correctly Use Experimentation in PM by Google PM

When deep learners change their mind learning dynamics for active learning

Endsem AI merged.pdf

Review On In-Context Leaning.pptx

5 learning edited 2012.ppt

Knowledge Management in the Cloud: Benefits and Risks

An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...

An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...

Sources & methods of data collection

Conducting, analyzing and reporting in depth interviews slideshare 0213 dmf

Non sampling error

Rapid Assessment Process (1st Edition)

Do Screencasts Really Work? Assessing Student Learning through Instructional ...

The Scientific Method of Experimentation by Google PM

Crowd Teaching with Imperfect Labels

More Than Usability

Don't Fear the User

Recently uploaded

Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite

Top 10 CodeIgniter Development CompaniesTopCSSGallery

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB

Portal Kombat : extension du réseau de propagande russe中央社

State of the Smart Building Startup Landscape 2024!Memoori

2024 May Patch TuesdayIvanti

Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance

Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda

Design Guidelines for Passkeys 2024.pptxFIDO Alliance

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software

JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech

AI mind or machine power point presentationyogeshlabana357357

Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG

ADP Passwordless Journey Case Study.pptxFIDO Alliance

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan

الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3

Introduction to use of FHIR Documents in ABDMKumar Satyam

Recently uploaded (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire

Top 10 CodeIgniter Development Companies

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Portal Kombat : extension du réseau de propagande russe

State of the Smart Building Startup Landscape 2024!

2024 May Patch Tuesday

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

Easier, Faster, and More Powerful – Notes Document Properties Reimagined

Design Guidelines for Passkeys 2024.pptx

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

The Zero-ETL Approach: Enhancing Data Agility and Insight

JavaScript Usage Statistics 2024 - The Ultimate Guide

AI mind or machine power point presentation

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx

ADP Passwordless Journey Case Study.pptx

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf

الأمن السيبراني - ما لا يسع للمستخدم جهله

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf

Introduction to use of FHIR Documents in ABDM

How to learn with non-reliable labels?

1. Study Meeting Presentation:    How to learn with non-reliable labels?   Author: Morgan LEFRANC    Date: 2021/07/28  

2. How to learn with non-reliable labels? Morgan Lefranc - Ridge-i 2

3. What this presentation is NOT ● Exhaustive ● Deﬁnitive ● Detailed ● Perfectly accurate ● Beautiful ● Well prepared 3

4. Table of Contents 1. Incomplete supervision a. Active learning b. Semi-supervised learning 2. Inexact supervision a. Class-Activation Map b. Multiple Instance learning 3. Inaccurate supervision a. Crowd-sourcing techniques b. Conﬁdent learning 4

5. Incomplete supervision A small subset of the data contains labels, while the remaining majority if the data is unlabeled 5

6. How to deal with incomplete supervision? Human supervision available ACTIVE LEARNING ● A human oracle can be queried to request annotation for speciﬁc samples ● Need to ﬁnd good samples so that good performance can be achieved with minimal amount of data Human supervision non available SEMI-SUPERVISED LEARNING ● Exploit the partial labels to explain the unlabeled ones 6

7. Active learning: how to select samples? Uncertainty sampling Query by committee 7 Images by presenter

8. Examples of semi-supervised learning Low-density based Disagreement-based methods (e.g. co-training) 8 Images by presenter

9. Inexact supervision Each data sample has a label, but the supervision is not as ﬁne-grained as required for the task 9

10. Class Activation Map (CAM) 10

11. CAM for Object Detection If we know an image contains an object, we can use the CAM of this class to propose bounding boxes 11

12. Multiple Instance Learning (MIL) ● Each bag of instances is annotated. The goal is to predict individual instances. ● Individual instance predictions inside a bag are aggregated and compared with the bag label. Errors are back-propagated. 12

13. Example of MIL for Semantic Segmentation from BB labels 13 MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, and Zhuowen Tu

14. Inaccurate supervision The supervision information is not always the ground truth 14

15. Crowd-sourcing techniques ● Can provide a lot of low-quality labels. What to do with them? ● Most common technique: ask several workers to carry the same annotation task, average the results. ● More advanced: Track the performance of each worker, use Bayesian inference techniques to keep an estimate of their reliability, and give reliable workers more weight on the decision. → similar to active learning / semi-supervised learning workﬂows. 15

16. Confident learning - cleanlab Objective: Find and remove noisy labels in a dataset. 1. Train a model on a noisy labeled dataset. 2. Run the model on ground-truth labels and get the prediction confidence. 3. Count the number of times where the confidence for an incorrect class is higher than a certain threshold. 4. Use this count as a way to estimate the noise in the labels, rank the less reliable ones and prune them out. 16

17. CleanLab output: noisy labels from ImageNet (more than 100,000 in total). Blue: multi-label images, green: ontological issue, red: label error. 17

18. References 1. A brief introduction to weakly supervised learning 2. How to Use Inaccurate Data for Machine Learning with Weakly Supervised Learning 3. Conﬁdent Learning: Estimating Uncertainty in Dataset Labels 18

19. Appendix: Conﬁdent learning equations 19

How to learn with non-reliable labels?

Recommended

Recommended

More Related Content

Similar to How to learn with non-reliable labels?

Similar to How to learn with non-reliable labels? (20)

More from Ridge-i, Inc.

More from Ridge-i, Inc. (8)

Recently uploaded

Recently uploaded (20)

How to learn with non-reliable labels?