SlideShare a Scribd company logo
1 of 19
Download to read offline
Study Meeting Presentation:



How to learn with non-reliable labels? 

Author: Morgan LEFRANC



Date: 2021/07/28 

How to learn with
non-reliable
labels?
Morgan Lefranc - Ridge-i
2
What this presentation is NOT
● Exhaustive
● Definitive
● Detailed
● Perfectly accurate
● Beautiful
● Well prepared
3
Table of Contents
1. Incomplete supervision
a. Active learning
b. Semi-supervised learning
2. Inexact supervision
a. Class-Activation Map
b. Multiple Instance learning
3. Inaccurate supervision
a. Crowd-sourcing techniques
b. Confident learning
4
Incomplete supervision
A small subset of the data contains labels, while the remaining majority if
the data is unlabeled
5
How to deal with incomplete
supervision?
Human supervision available
ACTIVE LEARNING
● A human oracle can be queried to
request annotation for specific
samples
● Need to find good samples so that
good performance can be achieved
with minimal amount of data
Human supervision non available
SEMI-SUPERVISED LEARNING
● Exploit the partial labels to explain
the unlabeled ones
6
Active learning: how to select samples?
Uncertainty sampling Query by committee
7
Images by presenter
Examples of semi-supervised learning
Low-density based Disagreement-based methods (e.g. co-training)
8
Images by presenter
Inexact supervision
Each data sample has a label, but the supervision is not as fine-grained as
required for the task
9
Class Activation Map (CAM)
10
CAM for Object Detection
If we know an image contains an object, we can use the CAM of this class to
propose bounding boxes
11
Multiple Instance Learning (MIL)
● Each bag of instances is annotated. The goal is to predict individual
instances.
● Individual instance predictions inside a bag are aggregated and
compared with the bag label. Errors are back-propagated. 12
Example of MIL for Semantic
Segmentation from BB labels
13
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, Jiajun Wu, Yibiao Zhao, Jun-Yan
Zhu, Siwei Luo, and Zhuowen Tu
Inaccurate supervision
The supervision information is not always the ground truth
14
Crowd-sourcing techniques
● Can provide a lot of low-quality labels. What to do with them?
● Most common technique: ask several workers to carry the same annotation task,
average the results.
● More advanced: Track the performance of each worker, use Bayesian inference
techniques to keep an estimate of their reliability, and give reliable workers more
weight on the decision. → similar to active learning / semi-supervised learning
workflows.
15
Confident learning - cleanlab
Objective: Find and remove noisy labels in a
dataset.
1. Train a model on a noisy labeled dataset.
2. Run the model on ground-truth labels and
get the prediction confidence.
3. Count the number of times where the
confidence for an incorrect class is higher
than a certain threshold.
4. Use this count as a way to estimate the noise
in the labels, rank the less reliable ones and
prune them out.
16
CleanLab output: noisy labels from ImageNet (more than 100,000 in total).
Blue: multi-label images, green: ontological issue, red: label error.
17
References
1. A brief introduction to weakly supervised learning
2. How to Use Inaccurate Data for Machine Learning with Weakly Supervised Learning
3. Confident Learning: Estimating Uncertainty in Dataset Labels
18
Appendix: Confident learning equations
19

More Related Content

Similar to How to learn with non-reliable labels?

BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSBUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSHuman Capital Media
 
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSBUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSHuman Capital Media
 
Planning & Conducting a Research Study
Planning & Conducting a Research StudyPlanning & Conducting a Research Study
Planning & Conducting a Research StudyElicia Stewart
 
How to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMHow to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMProduct School
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learningDevansh16
 
Review On In-Context Leaning.pptx
Review On In-Context Leaning.pptxReview On In-Context Leaning.pptx
Review On In-Context Leaning.pptxwesleyshih4
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.pptHenokGetachew15
 
Knowledge Management in the Cloud: Benefits and Risks
Knowledge Management in the Cloud: Benefits and RisksKnowledge Management in the Cloud: Benefits and Risks
Knowledge Management in the Cloud: Benefits and RisksEditor IJCATR
 
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...Editor IJCATR
 
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...Editor IJCATR
 
Sources & methods of data collection
Sources & methods of data collectionSources & methods of data collection
Sources & methods of data collectionpss_prashant
 
Conducting, analyzing and reporting in depth interviews slideshare 0213 dmf
Conducting, analyzing and reporting in depth interviews slideshare  0213 dmfConducting, analyzing and reporting in depth interviews slideshare  0213 dmf
Conducting, analyzing and reporting in depth interviews slideshare 0213 dmfDavid Filiberto
 
Rapid Assessment Process (1st Edition)
Rapid Assessment Process (1st Edition)Rapid Assessment Process (1st Edition)
Rapid Assessment Process (1st Edition)James Beebe
 
Do Screencasts Really Work? Assessing Student Learning through Instructional ...
Do Screencasts Really Work? Assessing Student Learning through Instructional ...Do Screencasts Really Work? Assessing Student Learning through Instructional ...
Do Screencasts Really Work? Assessing Student Learning through Instructional ...juliepia
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMProduct School
 
Crowd Teaching with Imperfect Labels
Crowd Teaching with Imperfect LabelsCrowd Teaching with Imperfect Labels
Crowd Teaching with Imperfect Labelscollwe
 
More Than Usability
More Than UsabilityMore Than Usability
More Than UsabilityRazan Sadeq
 

Similar to How to learn with non-reliable labels? (20)

BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSBUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
 
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDSBUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
BUSTING E-LEARNING MYTHS: SEPARATING EFFECTIVE TRAINING FROM TRENDS
 
Planning & Conducting a Research Study
Planning & Conducting a Research StudyPlanning & Conducting a Research Study
Planning & Conducting a Research Study
 
How to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMHow to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PM
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learning
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
 
Review On In-Context Leaning.pptx
Review On In-Context Leaning.pptxReview On In-Context Leaning.pptx
Review On In-Context Leaning.pptx
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.ppt
 
Knowledge Management in the Cloud: Benefits and Risks
Knowledge Management in the Cloud: Benefits and RisksKnowledge Management in the Cloud: Benefits and Risks
Knowledge Management in the Cloud: Benefits and Risks
 
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...
An Evaluation of Feature Selection Methods for Positive- Unlabeled Learning i...
 
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...
An Evaluation of Feature Selection Methods for Positive - Unlabeled Learning ...
 
Sources & methods of data collection
Sources & methods of data collectionSources & methods of data collection
Sources & methods of data collection
 
Conducting, analyzing and reporting in depth interviews slideshare 0213 dmf
Conducting, analyzing and reporting in depth interviews slideshare  0213 dmfConducting, analyzing and reporting in depth interviews slideshare  0213 dmf
Conducting, analyzing and reporting in depth interviews slideshare 0213 dmf
 
Non sampling error
Non sampling errorNon sampling error
Non sampling error
 
Rapid Assessment Process (1st Edition)
Rapid Assessment Process (1st Edition)Rapid Assessment Process (1st Edition)
Rapid Assessment Process (1st Edition)
 
Do Screencasts Really Work? Assessing Student Learning through Instructional ...
Do Screencasts Really Work? Assessing Student Learning through Instructional ...Do Screencasts Really Work? Assessing Student Learning through Instructional ...
Do Screencasts Really Work? Assessing Student Learning through Instructional ...
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PM
 
Crowd Teaching with Imperfect Labels
Crowd Teaching with Imperfect LabelsCrowd Teaching with Imperfect Labels
Crowd Teaching with Imperfect Labels
 
More Than Usability
More Than UsabilityMore Than Usability
More Than Usability
 
Don't Fear the User
Don't Fear the UserDon't Fear the User
Don't Fear the User
 

More from Ridge-i, Inc.

Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewRidge-i, Inc.
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning IntroductionRidge-i, Inc.
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
May internship challenge: Font Generator
May internship challenge: Font GeneratorMay internship challenge: Font Generator
May internship challenge: Font GeneratorRidge-i, Inc.
 
How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)Ridge-i, Inc.
 
May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...Ridge-i, Inc.
 
May internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls AppMay internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls AppRidge-i, Inc.
 

More from Ridge-i, Inc. (8)

Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overview
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning Introduction
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
May internship challenge: Font Generator
May internship challenge: Font GeneratorMay internship challenge: Font Generator
May internship challenge: Font Generator
 
How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)
 
May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...
 
May internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls AppMay internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls App
 

Recently uploaded

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 

Recently uploaded (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

How to learn with non-reliable labels?

  • 1. Study Meeting Presentation:
 
 How to learn with non-reliable labels? 
 Author: Morgan LEFRANC
 
 Date: 2021/07/28 

  • 2. How to learn with non-reliable labels? Morgan Lefranc - Ridge-i 2
  • 3. What this presentation is NOT ● Exhaustive ● Definitive ● Detailed ● Perfectly accurate ● Beautiful ● Well prepared 3
  • 4. Table of Contents 1. Incomplete supervision a. Active learning b. Semi-supervised learning 2. Inexact supervision a. Class-Activation Map b. Multiple Instance learning 3. Inaccurate supervision a. Crowd-sourcing techniques b. Confident learning 4
  • 5. Incomplete supervision A small subset of the data contains labels, while the remaining majority if the data is unlabeled 5
  • 6. How to deal with incomplete supervision? Human supervision available ACTIVE LEARNING ● A human oracle can be queried to request annotation for specific samples ● Need to find good samples so that good performance can be achieved with minimal amount of data Human supervision non available SEMI-SUPERVISED LEARNING ● Exploit the partial labels to explain the unlabeled ones 6
  • 7. Active learning: how to select samples? Uncertainty sampling Query by committee 7 Images by presenter
  • 8. Examples of semi-supervised learning Low-density based Disagreement-based methods (e.g. co-training) 8 Images by presenter
  • 9. Inexact supervision Each data sample has a label, but the supervision is not as fine-grained as required for the task 9
  • 11. CAM for Object Detection If we know an image contains an object, we can use the CAM of this class to propose bounding boxes 11
  • 12. Multiple Instance Learning (MIL) ● Each bag of instances is annotated. The goal is to predict individual instances. ● Individual instance predictions inside a bag are aggregated and compared with the bag label. Errors are back-propagated. 12
  • 13. Example of MIL for Semantic Segmentation from BB labels 13 MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation, Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, and Zhuowen Tu
  • 14. Inaccurate supervision The supervision information is not always the ground truth 14
  • 15. Crowd-sourcing techniques ● Can provide a lot of low-quality labels. What to do with them? ● Most common technique: ask several workers to carry the same annotation task, average the results. ● More advanced: Track the performance of each worker, use Bayesian inference techniques to keep an estimate of their reliability, and give reliable workers more weight on the decision. → similar to active learning / semi-supervised learning workflows. 15
  • 16. Confident learning - cleanlab Objective: Find and remove noisy labels in a dataset. 1. Train a model on a noisy labeled dataset. 2. Run the model on ground-truth labels and get the prediction confidence. 3. Count the number of times where the confidence for an incorrect class is higher than a certain threshold. 4. Use this count as a way to estimate the noise in the labels, rank the less reliable ones and prune them out. 16
  • 17. CleanLab output: noisy labels from ImageNet (more than 100,000 in total). Blue: multi-label images, green: ontological issue, red: label error. 17
  • 18. References 1. A brief introduction to weakly supervised learning 2. How to Use Inaccurate Data for Machine Learning with Weakly Supervised Learning 3. Confident Learning: Estimating Uncertainty in Dataset Labels 18