Joint optimization framework for learning with noisy labels

•Download as PPTX, PDF•

0 likes•198 views

Cheng-You Lu

Software

Problem
 Many large-scale datasets are collected
from websites, however they tend to contain
inaccurate labels that are termed as noisy
labels
Image :
Noisy label : dog
Clean label : horse

Goal
 A joint optimization framework of learning
DNN parameter and estimating true labels.
 Then, train a usual image classification on
these estimated labels.

Label
 Hard-label spaces H = {y : y ϵ {0, 1}c, 1Ty = 1}
Ex : yT = [ 0, 1, 0] with c = 3
 Soft-label spaces S = {y : y ϵ [0, 1]c, 1Ty = 1}
Ex : yT = [ 0.2, 0.7, 0.1] with c = 3
Parameters
c : number of classes
y : label (column vector)
1 : column vector of all one

The concept of joint optimization
framework

The concept of joint optimization
framework
Algorithm 1 Alternating Optimization
for t  1 to num_epochs do
update θ(t+1) by SGD on L(θ(t),Y(t)|X)
update Y(t+1) by (hard-label)
or (soft-label)
end for

Loss – Joint Optimization Framework
►Loss function
► L(θ,Y|X) = Lc(θ,Y|X)+αLp(θ|X)+βLe(θ|X)
►Optimization
► arg min L(θ,Y|X)
►Parameters
► Y : label
► X : Image
► θ : parameters of network
► α : hyperparameter
► β : hyperparameter

Loss – Joint Optimization Framework
►First term
► Lc(θ,Y|X) =
1
n i=1
n
DKL(yi||s(θ, xi))
►Parameters
► Y : label
► X : image
► θ : parameters of network
► s : prediction of network
► n : train set size

Loss function – usual image
classification network
►Loss function
► L = −
1
n i=1
n
j=1
c
yij
GT
logsj θ, xi
►Optimization
► arg min L(θ|X,Y)
►Parameters
► L : cross entropy between probability distribution y and s
► n : train set size
► c : number of class
► Y : label (ground truth)
► s : prediction of network

Loss – Joint Optimization Framework
► Second term
► LP = j=1
c
pj log
pj
sj(θ,X)
► s θ, X =
1
n i=1
n
s θ, xi ≈
1
β xϵβ s(θ, x)
►Parameters
► p : prior probability distribution(distribution of
classes among all training data)
► X : image
► s : prediction of network
► θ : parameter of network
► c : number of classes
► n : train set size
► β : batch size
Ex:
In CIFAR-10, the p will be [0.1, 0.1 ,0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.1]. Because each
classes has the same number of images in
CIFAR-10.

Loss – Joint Optimization Framework
► c : number of classes
► n : train set size
►Third term
► Le = −
1
n i=1
n
j=1
c
sj(θ, xi)logsj θ, xi
► Ex:
► Epoch t : s = [0.2,0.8]
► Epoch t+1 : s = [0.1,0.9]
►Parameters
► X : image
► s : prediction of network
► θ : parameter of network
L θ, Y X = Lc θ, Y X + αLp θ X + βLe(θ|X)

other strategy – large learning rate
►Experiment
► test accuracy remains high
at the end of training when
the learning rate is high.
►Parameters
► X-axis : epoch
► Y-axis : test accuracy
► r : noise rate
► lr : learning rate

Experiment on SN-CIFAR10
best : the scores of the epoch where the validation
accuracy is optimal
last : the scores at the end of training
Test accuracy : Performance on test set
Recovery accuracy : Performance on the train set
yi =
yi
GT
with the probability of 1 − r
random one − hot vector with the probability of r

What's hot

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi

Efficient end-to-end learning for quantizable representationsNAVER Engineering

Md2k 0219 shangBBKuhn

2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University

Tensor boardSung Kim

A2 python basics_nptel_pds2_solMaynaShah1

Complex numbers polynomial multiplicationStrand Life Sciences Pvt Ltd

ECCV2010: feature learning for image classification, part 2zukun

Introduction to TensorFlow, by Machine Learning at BerkeleyTed Xiao

Lec 3-mcgregorAtner Yegorov

Support Vector Machines SimplyEmad Nabil

A Simple Review on SVMHonglin Yu

Matrix Factorizations for Recommender SystemsDmitriy Selivanov

Lecture 11 (Digital Image Processing)VARUN KUMAR

Introduction to Machine LearningBig_Data_Ukraine

Gentlest Introduction to Tensorflow - Part 3Khor SoonHin

Gentlest Introduction to TensorflowKhor SoonHin

Information-theoretic clustering with applicationsFrank Nielsen

Triangle counting handoutcsedays

Explanation on Tensorflow example -Deep mnist for expert홍배 김

What's hot (20)

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...

Efficient end-to-end learning for quantizable representations

Md2k 0219 shang

2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines

Tensor board

A2 python basics_nptel_pds2_sol

Complex numbers polynomial multiplication

ECCV2010: feature learning for image classification, part 2

Introduction to TensorFlow, by Machine Learning at Berkeley

Lec 3-mcgregor

Support Vector Machines Simply

A Simple Review on SVM

Matrix Factorizations for Recommender Systems

Lecture 11 (Digital Image Processing)

Introduction to Machine Learning

Gentlest Introduction to Tensorflow - Part 3

Gentlest Introduction to Tensorflow

Information-theoretic clustering with applications

Triangle counting handout

Explanation on Tensorflow example -Deep mnist for expert

Similar to Joint optimization framework for learning with noisy labels

机器学习AdaboostShocky1

Signals and Systems Homework Help.pptxMatlab Assignment Experts

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie

Data-Driven Recommender Systemsrecsysfr

Text classificationFraboni Ec

Text classificationDavid Hoen

Text classificationTony Nguyen

Text classificationYoung Alista

Text classificationHarry Potter

Text classificationLuis Goldster

Text classificationJames Wong

Introduction to Big Data ScienceAlbert Bifet

Complex models in ecology: challenges and solutionsPeter Solymos

ML unit-1.pptxSwarnaKumariChinni

Machine Learning for TradingLarry Guo

Matching networks for one shot learningKazuki Fujikawa

"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...Paris Women in Machine Learning and Data Science

Neural networks with pythonSimone Piunno

The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus

Yulia Honcharenko "Application of metric learning for logo recognition"Fwdays

Similar to Joint optimization framework for learning with noisy labels (20)

机器学习Adaboost

Signals and Systems Homework Help.pptx

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

Data-Driven Recommender Systems

Text classification

Introduction to Big Data Science

Complex models in ecology: challenges and solutions

ML unit-1.pptx

Machine Learning for Trading

Matching networks for one shot learning

"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...

Neural networks with python

The Concurrent Constraint Programming Research Programmes -- Redux (part2)

Yulia Honcharenko "Application of metric learning for logo recognition"

Recently uploaded

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

React Server Component in Next.js by Hanief UtamaHanief Utama

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

MYjobs Presentation Django-based projectAnoyGreter

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

software engineering Chapter 5 System modeling.pptxnada99848

EY_Graph Database Powered SustainabilityNeo4j

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Professional Resume Template for Software DevelopersVinodh Ram

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack

React Server Component in Next.js by Hanief Utama

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Implementing Zero Trust strategy with Azure

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

Advancing Engineering with AI through the Next Generation of Strategic Projec...

MYjobs Presentation Django-based project

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

The Evolution of Karaoke From Analog to App.pdf

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

software engineering Chapter 5 System modeling.pptx

EY_Graph Database Powered Sustainability

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Automate your Kamailio Test Calls - Kamailio World 2024

Der Spagat zwischen BIAS und FAIRNESS (2024)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Professional Resume Template for Software Developers

Joint optimization framework for learning with noisy labels

1. Joint Optimization Framework for Learning with Noisy Labels Author : Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa Publish: CVPR2018

2. Problem  Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels Image : Noisy label : dog Clean label : horse

3. Goal  A joint optimization framework of learning DNN parameter and estimating true labels.  Then, train a usual image classification on these estimated labels.

4. Label  Hard-label spaces H = {y : y ϵ {0, 1}c, 1Ty = 1} Ex : yT = [ 0, 1, 0] with c = 3  Soft-label spaces S = {y : y ϵ [0, 1]c, 1Ty = 1} Ex : yT = [ 0.2, 0.7, 0.1] with c = 3 Parameters c : number of classes y : label (column vector) 1 : column vector of all one

5. The concept of joint optimization framework

6. The concept of joint optimization framework Algorithm 1 Alternating Optimization for t  1 to num_epochs do update θ(t+1) by SGD on L(θ(t),Y(t)|X) update Y(t+1) by (hard-label) or (soft-label) end for

8. Loss – Joint Optimization Framework ►First term ► Lc(θ,Y|X) = 1 n i=1 n DKL(yi||s(θ, xi)) ►Parameters ► Y : label ► X : image ► θ : parameters of network ► s : prediction of network ► n : train set size

9. Loss function – usual image classification network ►Loss function ► L = − 1 n i=1 n j=1 c yij GT logsj θ, xi ►Optimization ► arg min L(θ|X,Y) ►Parameters ► L : cross entropy between probability distribution y and s ► n : train set size ► c : number of class ► Y : label (ground truth) ► s : prediction of network

10. Loss – Joint Optimization Framework ► Second term ► LP = j=1 c pj log pj sj(θ,X) ► s θ, X = 1 n i=1 n s θ, xi ≈ 1 β xϵβ s(θ, x) ►Parameters ► p : prior probability distribution(distribution of classes among all training data) ► X : image ► s : prediction of network ► θ : parameter of network ► c : number of classes ► n : train set size ► β : batch size Ex: In CIFAR-10, the p will be [0.1, 0.1 ,0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]. Because each classes has the same number of images in CIFAR-10.

11. Loss – Joint Optimization Framework ► c : number of classes ► n : train set size ►Third term ► Le = − 1 n i=1 n j=1 c sj(θ, xi)logsj θ, xi ► Ex: ► Epoch t : s = [0.2,0.8] ► Epoch t+1 : s = [0.1,0.9] ►Parameters ► X : image ► s : prediction of network ► θ : parameter of network L θ, Y X = Lc θ, Y X + αLp θ X + βLe(θ|X)

12. other strategy – large learning rate ►Experiment ► test accuracy remains high at the end of training when the learning rate is high. ►Parameters ► X-axis : epoch ► Y-axis : test accuracy ► r : noise rate ► lr : learning rate

13. Experiment on SN-CIFAR10 best : the scores of the epoch where the validation accuracy is optimal last : the scores at the end of training Test accuracy : Performance on test set Recovery accuracy : Performance on the train set yi = yi GT with the probability of 1 − r random one − hot vector with the probability of r

14. Experiment on Clothing1M dataset

Editor's Notes

The paper I want to present is ‘Joint Optimization Framework for Learning with Noisy Labels’. The author is Daiki Tanaka. This paper is published on CVPR2018.
Deep Neural Networks have reached a significant performance on image classification. However, many datasets are collected from websites. Therefore, they tend to contain noisy labels. These noisy labels will decrease the performance of the network.
Hence, the author propose a joint optimization framework for image classification. This framework will estimate true labels for the classification network.
Before start, there are two kind of label for image classification. For hard-label, the value in y is either 1 or 0, and there summation should be 1. For soft-label, the value in y is between 0 and 1, and there summation should be 1.
X is image, Y is label, CNN is convolution neural network for image classification, L is loss function, S is the probability prediction of network,called soft label, format is in one hot. There are two different terms between this frame work and usual image classification framework. The loss function and label. They opposed to treating the label as fixed because they are noisy label. Therefore, the labels are alternatively updated for each epoch.
Let’s look at the algorithm first. I will explain the loss function later. The alogorithm is simple. In each epoch. They just update the parameter of network by optimizer. Then update the label by the prediction of network.
Lc is KL divergence between label and prediction of network. When y is fixed, minimize KL divigence is the same as minimize cross entropy. Therefore,this term is the same as the loss function of usual image classification network.
In the usual image classification network. We just use cross entropy between label and prediction of network. Try to find a parameter theta to minimize the loss function.
Second term is the KL divergence between prior probability distribution p and mean probability s bar. S bar is the mean probability in the training data, in the implantation , they approximinate it by batch. However, this approximinate can not treat a large number of classes and extreme imbalanced classes. This term will make the prediction of network follow the distribution p.
The final term is the entropy of prediction of network. This term is requested for the training loss when we used soft label as label. With alpha and beta is zeros and we update the label by soft label. Both theta and label will be stuck in local optima and the learning process does not proceed. To overcome this problem, this term will concentrate the probability distribution of each soft label to a single class.
By the experimiment, test accuracy remains high at the end of training when the learning rate is high.
Symmetric noise cifar 10 is based on cifar10 dataset and ther are probability of r to changed the label of an image. Best, is the test accuracy on validation set. Last, is the test accuracy on testing set. There method reach the state of the art on CIFAR10. They also experiment their method on AN-CIFAR10 and PL-CIFAR and the performance is well.
They use clothing1M dataset to examine the performance of their method in a real setting The images of this dataset are crawled from online shop and the label are generated by using the surrounding texts of the images on the website. noisy label is 61.54% Comparable performance on the clothing1M dataset.

Joint optimization framework for learning with noisy labels

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Joint optimization framework for learning with noisy labels

Similar to Joint optimization framework for learning with noisy labels (20)

Recently uploaded

Recently uploaded (20)

Joint optimization framework for learning with noisy labels

Editor's Notes