Joint optimization framework for learning with noisy labels

•Download as PPTX, PDF•

0 likes•199 views

Cheng-You Lu

Software

Problem
 Many large-scale datasets are collected
from websites, however they tend to contain
inaccurate labels that are termed as noisy
labels
Image :
Noisy label : dog
Clean label : horse

Goal
 A joint optimization framework of learning
DNN parameter and estimating true labels.
 Then, train a usual image classification on
these estimated labels.

Label
 Hard-label spaces H = {y : y ϵ {0, 1}c, 1Ty = 1}
Ex : yT = [ 0, 1, 0] with c = 3
 Soft-label spaces S = {y : y ϵ [0, 1]c, 1Ty = 1}
Ex : yT = [ 0.2, 0.7, 0.1] with c = 3
Parameters
c : number of classes
y : label (column vector)
1 : column vector of all one

The concept of joint optimization
framework

The concept of joint optimization
framework
Algorithm 1 Alternating Optimization
for t  1 to num_epochs do
update θ(t+1) by SGD on L(θ(t),Y(t)|X)
update Y(t+1) by (hard-label)
or (soft-label)
end for

Loss – Joint Optimization Framework
►Loss function
► L(θ,Y|X) = Lc(θ,Y|X)+αLp(θ|X)+βLe(θ|X)
►Optimization
► arg min L(θ,Y|X)
►Parameters
► Y : label
► X : Image
► θ : parameters of network
► α : hyperparameter
► β : hyperparameter

Loss – Joint Optimization Framework
►First term
► Lc(θ,Y|X) =
1
n i=1
n
DKL(yi||s(θ, xi))
►Parameters
► Y : label
► X : image
► θ : parameters of network
► s : prediction of network
► n : train set size

Loss function – usual image
classification network
►Loss function
► L = −
1
n i=1
n
j=1
c
yij
GT
logsj θ, xi
►Optimization
► arg min L(θ|X,Y)
►Parameters
► L : cross entropy between probability distribution y and s
► n : train set size
► c : number of class
► Y : label (ground truth)
► s : prediction of network

Loss – Joint Optimization Framework
► Second term
► LP = j=1
c
pj log
pj
sj(θ,X)
► s θ, X =
1
n i=1
n
s θ, xi ≈
1
β xϵβ s(θ, x)
►Parameters
► p : prior probability distribution(distribution of
classes among all training data)
► X : image
► s : prediction of network
► θ : parameter of network
► c : number of classes
► n : train set size
► β : batch size
Ex:
In CIFAR-10, the p will be [0.1, 0.1 ,0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.1]. Because each
classes has the same number of images in
CIFAR-10.

Loss – Joint Optimization Framework
► c : number of classes
► n : train set size
►Third term
► Le = −
1
n i=1
n
j=1
c
sj(θ, xi)logsj θ, xi
► Ex:
► Epoch t : s = [0.2,0.8]
► Epoch t+1 : s = [0.1,0.9]
►Parameters
► X : image
► s : prediction of network
► θ : parameter of network
L θ, Y X = Lc θ, Y X + αLp θ X + βLe(θ|X)

other strategy – large learning rate
►Experiment
► test accuracy remains high
at the end of training when
the learning rate is high.
►Parameters
► X-axis : epoch
► Y-axis : test accuracy
► r : noise rate
► lr : learning rate

Experiment on SN-CIFAR10
best : the scores of the epoch where the validation
accuracy is optimal
last : the scores at the end of training
Test accuracy : Performance on test set
Recovery accuracy : Performance on the train set
yi =
yi
GT
with the probability of 1 − r
random one − hot vector with the probability of r

What's hot

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi

Efficient end-to-end learning for quantizable representationsNAVER Engineering

Md2k 0219 shangBBKuhn

2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University

Tensor boardSung Kim

A2 python basics_nptel_pds2_solMaynaShah1

Complex numbers polynomial multiplicationStrand Life Sciences Pvt Ltd

ECCV2010: feature learning for image classification, part 2zukun

Introduction to TensorFlow, by Machine Learning at BerkeleyTed Xiao

Lec 3-mcgregorAtner Yegorov

Support Vector Machines SimplyEmad Nabil

A Simple Review on SVMHonglin Yu

Matrix Factorizations for Recommender SystemsDmitriy Selivanov

Lecture 11 (Digital Image Processing)VARUN KUMAR

Introduction to Machine LearningBig_Data_Ukraine

Gentlest Introduction to Tensorflow - Part 3Khor SoonHin

Gentlest Introduction to TensorflowKhor SoonHin

Information-theoretic clustering with applicationsFrank Nielsen

Triangle counting handoutcsedays

Explanation on Tensorflow example -Deep mnist for expert홍배 김

What's hot (20)

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...

Efficient end-to-end learning for quantizable representations

Md2k 0219 shang

2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines

Tensor board

A2 python basics_nptel_pds2_sol

Complex numbers polynomial multiplication

ECCV2010: feature learning for image classification, part 2

Introduction to TensorFlow, by Machine Learning at Berkeley

Lec 3-mcgregor

Support Vector Machines Simply

A Simple Review on SVM

Matrix Factorizations for Recommender Systems

Lecture 11 (Digital Image Processing)

Introduction to Machine Learning

Gentlest Introduction to Tensorflow - Part 3

Gentlest Introduction to Tensorflow

Information-theoretic clustering with applications

Triangle counting handout

Explanation on Tensorflow example -Deep mnist for expert

Similar to Joint optimization framework for learning with noisy labels

机器学习AdaboostShocky1

Signals and Systems Homework Help.pptxMatlab Assignment Experts

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie

Data-Driven Recommender Systemsrecsysfr

Text classificationFraboni Ec

Text classificationDavid Hoen

Text classificationJames Wong

Text classificationTony Nguyen

Text classificationYoung Alista

Text classificationHarry Potter

Text classificationLuis Goldster

Introduction to Big Data ScienceAlbert Bifet

Complex models in ecology: challenges and solutionsPeter Solymos

ML unit-1.pptxSwarnaKumariChinni

Machine Learning for TradingLarry Guo

Matching networks for one shot learningKazuki Fujikawa

"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...Paris Women in Machine Learning and Data Science

Neural networks with pythonSimone Piunno

The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus

Yulia Honcharenko "Application of metric learning for logo recognition"Fwdays

Similar to Joint optimization framework for learning with noisy labels (20)

机器学习Adaboost

Signals and Systems Homework Help.pptx

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

Data-Driven Recommender Systems

Text classification

Introduction to Big Data Science

Complex models in ecology: challenges and solutions

ML unit-1.pptx

Machine Learning for Trading

Matching networks for one shot learning

"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...

Neural networks with python

The Concurrent Constraint Programming Research Programmes -- Redux (part2)

Yulia Honcharenko "Application of metric learning for logo recognition"

Recently uploaded

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Software Quality Assurance Interview QuestionsArshad QA

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

TECUNIQUE: Success Stories: IT Service providermohitmore19

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Right Money Management App For Your Financial GoalsJhone kinadey

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Recently uploaded (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Software Quality Assurance Interview Questions

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

How To Use Server-Side Rendering with Nuxt.js

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

HR Software Buyers Guide in 2024 - HRSoftware.com

TECUNIQUE: Success Stories: IT Service provider

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Diamond Application Development Crafting Solutions with Precision

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Right Money Management App For Your Financial Goals

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

Unlocking the Future of AI Agents with Large Language Models

Joint optimization framework for learning with noisy labels

1. Joint Optimization Framework for Learning with Noisy Labels Author : Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa Publish: CVPR2018

2. Problem  Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels Image : Noisy label : dog Clean label : horse

3. Goal  A joint optimization framework of learning DNN parameter and estimating true labels.  Then, train a usual image classification on these estimated labels.

4. Label  Hard-label spaces H = {y : y ϵ {0, 1}c, 1Ty = 1} Ex : yT = [ 0, 1, 0] with c = 3  Soft-label spaces S = {y : y ϵ [0, 1]c, 1Ty = 1} Ex : yT = [ 0.2, 0.7, 0.1] with c = 3 Parameters c : number of classes y : label (column vector) 1 : column vector of all one

5. The concept of joint optimization framework

6. The concept of joint optimization framework Algorithm 1 Alternating Optimization for t  1 to num_epochs do update θ(t+1) by SGD on L(θ(t),Y(t)|X) update Y(t+1) by (hard-label) or (soft-label) end for

8. Loss – Joint Optimization Framework ►First term ► Lc(θ,Y|X) = 1 n i=1 n DKL(yi||s(θ, xi)) ►Parameters ► Y : label ► X : image ► θ : parameters of network ► s : prediction of network ► n : train set size

9. Loss function – usual image classification network ►Loss function ► L = − 1 n i=1 n j=1 c yij GT logsj θ, xi ►Optimization ► arg min L(θ|X,Y) ►Parameters ► L : cross entropy between probability distribution y and s ► n : train set size ► c : number of class ► Y : label (ground truth) ► s : prediction of network

10. Loss – Joint Optimization Framework ► Second term ► LP = j=1 c pj log pj sj(θ,X) ► s θ, X = 1 n i=1 n s θ, xi ≈ 1 β xϵβ s(θ, x) ►Parameters ► p : prior probability distribution(distribution of classes among all training data) ► X : image ► s : prediction of network ► θ : parameter of network ► c : number of classes ► n : train set size ► β : batch size Ex: In CIFAR-10, the p will be [0.1, 0.1 ,0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]. Because each classes has the same number of images in CIFAR-10.

11. Loss – Joint Optimization Framework ► c : number of classes ► n : train set size ►Third term ► Le = − 1 n i=1 n j=1 c sj(θ, xi)logsj θ, xi ► Ex: ► Epoch t : s = [0.2,0.8] ► Epoch t+1 : s = [0.1,0.9] ►Parameters ► X : image ► s : prediction of network ► θ : parameter of network L θ, Y X = Lc θ, Y X + αLp θ X + βLe(θ|X)

12. other strategy – large learning rate ►Experiment ► test accuracy remains high at the end of training when the learning rate is high. ►Parameters ► X-axis : epoch ► Y-axis : test accuracy ► r : noise rate ► lr : learning rate

13. Experiment on SN-CIFAR10 best : the scores of the epoch where the validation accuracy is optimal last : the scores at the end of training Test accuracy : Performance on test set Recovery accuracy : Performance on the train set yi = yi GT with the probability of 1 − r random one − hot vector with the probability of r

14. Experiment on Clothing1M dataset

Editor's Notes

The paper I want to present is ‘Joint Optimization Framework for Learning with Noisy Labels’. The author is Daiki Tanaka. This paper is published on CVPR2018.
Deep Neural Networks have reached a significant performance on image classification. However, many datasets are collected from websites. Therefore, they tend to contain noisy labels. These noisy labels will decrease the performance of the network.
Hence, the author propose a joint optimization framework for image classification. This framework will estimate true labels for the classification network.
Before start, there are two kind of label for image classification. For hard-label, the value in y is either 1 or 0, and there summation should be 1. For soft-label, the value in y is between 0 and 1, and there summation should be 1.
X is image, Y is label, CNN is convolution neural network for image classification, L is loss function, S is the probability prediction of network,called soft label, format is in one hot. There are two different terms between this frame work and usual image classification framework. The loss function and label. They opposed to treating the label as fixed because they are noisy label. Therefore, the labels are alternatively updated for each epoch.
Let’s look at the algorithm first. I will explain the loss function later. The alogorithm is simple. In each epoch. They just update the parameter of network by optimizer. Then update the label by the prediction of network.
Lc is KL divergence between label and prediction of network. When y is fixed, minimize KL divigence is the same as minimize cross entropy. Therefore,this term is the same as the loss function of usual image classification network.
In the usual image classification network. We just use cross entropy between label and prediction of network. Try to find a parameter theta to minimize the loss function.
Second term is the KL divergence between prior probability distribution p and mean probability s bar. S bar is the mean probability in the training data, in the implantation , they approximinate it by batch. However, this approximinate can not treat a large number of classes and extreme imbalanced classes. This term will make the prediction of network follow the distribution p.
The final term is the entropy of prediction of network. This term is requested for the training loss when we used soft label as label. With alpha and beta is zeros and we update the label by soft label. Both theta and label will be stuck in local optima and the learning process does not proceed. To overcome this problem, this term will concentrate the probability distribution of each soft label to a single class.
By the experimiment, test accuracy remains high at the end of training when the learning rate is high.
Symmetric noise cifar 10 is based on cifar10 dataset and ther are probability of r to changed the label of an image. Best, is the test accuracy on validation set. Last, is the test accuracy on testing set. There method reach the state of the art on CIFAR10. They also experiment their method on AN-CIFAR10 and PL-CIFAR and the performance is well.
They use clothing1M dataset to examine the performance of their method in a real setting The images of this dataset are crawled from online shop and the label are generated by using the surrounding texts of the images on the website. noisy label is 61.54% Comparable performance on the clothing1M dataset.

Joint optimization framework for learning with noisy labels

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Joint optimization framework for learning with noisy labels

Similar to Joint optimization framework for learning with noisy labels (20)

Recently uploaded

Recently uploaded (20)

Joint optimization framework for learning with noisy labels

Editor's Notes