SlideShare a Scribd company logo
Progressive Identification of True Labels for
Partial-Label Learning
발표자: 송헌
펀더멘털팀: 김동희, 김지연, 김창연, 이근배, 이재윤
Lv, Jiaqi, et al. ICCV. 2020.
2
Problem setting
In partial-label learning (PLL), each training instance is associated with
a set of candidate labels among which exactly one is true.
The goal of PLL is reducing the overhead of
finding exact label from ambiguous candidates.
3
Related works
Most works are coupled to some specific optimization algorithms.
Therefore, it is difficult to apply them to DNNs.
D2CNN* is the only work that used DNNs with stochastic optimizers.
However, it restricted the networks to some specific architectures.
Complementary-label learning** uses a class that an example does not belong to.
Hence, it can be considered as an extreme PLL case with 𝑐 − 1 candidate labels.
*Yao, et al. "Deep discriminative cnn with temporal ensembling for ambiguously-labeled image classification." AAAI. 2020.
**Ishida, Takashi, et al. "Complementary-label learning for arbitrary losses and models." ICML. 2019
4
Contributions
In the paper,
The authors propose a classifier-consistent risk estimator for PLL theoretically.
They show the classifier learned from partially labeled data converges to
the optimal one learned from ordinarily labeled data.
The authors also propose a model-, loss-, optimizer-agnostic method for PLL.
5
Ordinary Multi-class Classification
Let 𝒳 ⊆ ℝ!
be the instance space and 𝒴 = 1,2, … , 𝑐 be the label space.
Let 𝑝 𝑥, 𝑦 be the underlying joint density of random variables 𝑋, 𝑌 ∈ 𝒳×𝒴.
The goal is to learn a classifier 𝒈: 𝒳 → ℝ" that minimizes the estimator of risk:
ℛ 𝒈 = 𝔼 #,% ~' (,) ℓ 𝒈 𝑋 , 𝑒%
where 𝒆𝒴
= 𝒆+
: 𝑖 ∈ 𝒴 denotes the standard canonical vector.
6
Partial-Label Learning
Let candidate label set 𝑆 be the power set of true label set .
Therefore, we need to train a classifier with partially labeled examples 𝑋, 𝑆 .
The PLL risk estimator is defined over 𝑝 𝑥, 𝑠 :
ℛ,-- 𝒈 = 𝔼 #,. ~' (,/ ℓ,-- 𝒈 𝑋 , 𝑆
where ℓ,--: ℝ"
×𝒫 𝒴 → ℝ.
7
Classifier-Consistent Risk Estimator
To make ℛ,-- 𝒈 estimable, an intuitive way is through a surrogate loss.
The authors consider that only the true label contributes to retrieving the classifier:
For that, they define the PLL loss as the minimal loss over the candidate label set:
ℓ,-- 𝒈 𝑋 , 𝑆 = min
+∈.
ℓ 𝒈 𝑋 , 𝒆+
This leads to a new risk estimator:
ℛ,-- 𝒈 = 𝔼 #,. ~' (,/ min
+∈.
ℓ 𝒈 𝑋 , 𝒆+
8
Lemmas
The ambiguity degree is defined as
𝛾 = sup
1,2 ~3 4,5 ,6~3 /|(,) , 8
%∈𝒴, 8
%9%
Pr H
𝑌 ∈ 𝑆
𝛾 is the maximum probability of a negative label H
𝑌 co-occurs with the true label 𝑌.
The small ambiguity degree condition (𝛾 < 1) implies that except for the true label,
no other labels will be 100% included in the candidate label set.
Moreover, if ℓ is the CE or MSE loss, the ordinary optimal classifier 𝒈∗
satisfies
𝑔+
∗
𝑋 = 𝑝 𝑌 = 𝑖 𝑋 .
9
Connection
Under the deterministic scenario,
if the small ambiguity degree condition is satisfied,
and CE or MSE loss is used, then,
the PLL optimal classifier 𝒈𝑷𝑳𝑳
∗
of ℛ,-- 𝒈 is equivalent to
the ordinary optimal classifier 𝒈∗
of ℛ 𝒈 :
𝒈𝑷𝑳𝑳
∗
= 𝒈∗
10
Estimation Error Bound
Let L
ℛ,-- be the empirical counterpart of ℛ,--, and M
𝑔,-- = argmin L
ℛ,-- 𝒈 be the
empirical risk classifier. Suppose 𝒢) be a class of real functions.
Rademacher complexity of 𝒢) over 𝑝 𝑥 with sample size 𝑛 is defined as ℜ= 𝒢) .
Then, for any 𝛿 > 0, we have with probability as least 1 − 𝛿,
ℛ,-- M
𝑔,-- − ℛ,-- 𝒈𝑷𝑳𝑳
∗
≤ 4 2𝑐𝐿ℓ Y
)?@
"
ℜ= 𝒢) + 2𝑀
log
2
𝛿
2𝑛
Therefore, ℛ,-- M
𝑔,-- → ℛ,-- 𝒈𝑷𝑳𝑳
∗
as the number of training data 𝑛 → ∞.
11
Proposed Method
However, the min operator in ℓ,-- 𝒈 𝑋 , 𝑆 makes optimization difficult,
because if a wrong label 𝑖 is selected in the beginning,
the optimization will focus on the wrong label till the end.
They first require that ℓ can be decomposed into each label
ℓ 𝒈 𝑋 , 𝒆% = Y
+?@
"
ℓ 𝑔+ 𝑋 , 𝑒+
%
Then, the authors relax the min operator by the dynamic weights.
L
ℛ,-- =
1
𝑛
Y
+?@
=
Y
A?@
"
𝑤+,Aℓ 𝑔A 𝑥+ , 𝑒A
/!
where 𝑒A
/!
is the 𝑗-th coordinate of 𝒆/! and 𝒆/! = ∑B∈/!
𝒆B
12
Proposed Method
Ideally, the label with weight 1 is exactly the true label and 0 otherwise.
Since the weights are latent, the minimizer of L
ℛ,-- cannot be solved directly.
Inspiring by the EM algorithm, the authors put more weights on more possible labels:
𝑤+,A = b
𝑔A 𝑥+
∑B∈/!
𝑔B 𝑥+
, 𝑗 ∈ 𝑠+
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If the small ambiguity degree condition is satisfied, models tend to remember the
true labels in the initial epochs, which guides the model towards a discriminative
classifier giving relatively low losses for more possible true labels.
13
Proposed Method
While they follow the EM algorithm,
they merge the E-step and M-step.
The weights can be updated at any epoch
such that the local convergence within
each epoch is not necessary.
Therefore, they gets rid of the overfitting
issues of EM methods.
14
Datasets
The authors used widely used benchmark datasets,
MNIST, Fashion-MNIST, Kuzushiji-MNIST, and CIFAR10
And five small datasets from UCI,
Yeast, Texture, Dermatology, Synthetic Control, and 20Newgroups.
They randomly flipped the negative label to positive label with probability 𝑞.
Moreover, they used real-world partial-label datasets,
Lost, Birdsong, MSRCv2, Soccer Player, and Yahoo! News.
15
Baselines
They compared the proposed method (PRODEN) with:
• PRODEN-itera: update the label weights every 100 epoch
• PRODEN-sudden: update weights 𝑤+,B = 1 if argmaxA∈/!
𝑔A(𝑥+) and 0 otherwise
• PRODEN-naïve: never update the weights but use uniform weights
• PN-oracle: train a model with ordinary labels
• PN-decomp: decompose one instance with multiple candidate labels
into many instances each one single label
• D2CNN: a PLL method based on DNN
• GA: a CLL method based on DNN
16
Results on Benchmark Datasets
When 𝑞 = 0.1, PRODEN is always the best method and comparable to PN-oracle.
The performance of PRODEN-itera deteriorates drastically with complex models
because of the overfitting issues.
17
Results on Benchmark Datasets
When 𝑞 = 0.7, PRODEN is still comparable to PN-oracle.
The superiority always stands out for PRODEN compared with D2CNN and GA.
18
Analysis on the Ambiguity Degree
They also gradually move 𝑞 from 0.5 to 0.9 to simulate 𝛾(𝛾 → 𝑞 as 𝑛 → ∞).
PRODEN tends to be less affected with increased ambiguity.
19
Results on Real-world Datasets
They compare the proposed method with classical PLL methods,
SURE, CLPL, ECOC, PLSVM, PLkNN, and IPAL
which can hardly be implemented by DNNs on real-world and small-scale datasets.
20
Results on Small-scale Datasets

More Related Content

What's hot

SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
CS, NcState
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
ADABoost classifier
ADABoost classifierADABoost classifier
ADABoost classifier
SreerajVA
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
arogozhnikov
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
Venkata Reddy Konasani
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
Zihui Li
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Zihui Li
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Chang
butest
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Akshay Udhane
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
Xiang Zhang
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ankit_ppt
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
Ashwin Shiv
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
butest
 

What's hot (20)

SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
ADABoost classifier
ADABoost classifierADABoost classifier
ADABoost classifier
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Chang
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
 

Similar to Progressive identification of true labels for partial label learning

MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
Golden Helix Inc
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
Golden Helix Inc
 
Batch gradient method for training of
Batch gradient method for training ofBatch gradient method for training of
Batch gradient method for training of
ijaia
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AmirParnianifard1
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Karlos Svoboda
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
Artificial Neural Networks Deep Learning Report
Artificial Neural Networks   Deep Learning ReportArtificial Neural Networks   Deep Learning Report
Artificial Neural Networks Deep Learning Report
Lisa Muthukumar
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
TELKOMNIKA JOURNAL
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIML
VijaySharma802
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
Group Project
Group ProjectGroup Project
Group Project
Xiyuan Sun
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
cscpconf
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
Max Kleiner
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
Shiwani Gupta
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
HaibinSu2
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 

Similar to Progressive identification of true labels for partial label learning (20)

MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
 
Batch gradient method for training of
Batch gradient method for training ofBatch gradient method for training of
Batch gradient method for training of
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Artificial Neural Networks Deep Learning Report
Artificial Neural Networks   Deep Learning ReportArtificial Neural Networks   Deep Learning Report
Artificial Neural Networks Deep Learning Report
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIML
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Group Project
Group ProjectGroup Project
Group Project
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 

Recently uploaded (20)

writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 

Progressive identification of true labels for partial label learning

  • 1. Progressive Identification of True Labels for Partial-Label Learning 발표자: 송헌 펀더멘털팀: 김동희, 김지연, 김창연, 이근배, 이재윤 Lv, Jiaqi, et al. ICCV. 2020.
  • 2. 2 Problem setting In partial-label learning (PLL), each training instance is associated with a set of candidate labels among which exactly one is true. The goal of PLL is reducing the overhead of finding exact label from ambiguous candidates.
  • 3. 3 Related works Most works are coupled to some specific optimization algorithms. Therefore, it is difficult to apply them to DNNs. D2CNN* is the only work that used DNNs with stochastic optimizers. However, it restricted the networks to some specific architectures. Complementary-label learning** uses a class that an example does not belong to. Hence, it can be considered as an extreme PLL case with 𝑐 − 1 candidate labels. *Yao, et al. "Deep discriminative cnn with temporal ensembling for ambiguously-labeled image classification." AAAI. 2020. **Ishida, Takashi, et al. "Complementary-label learning for arbitrary losses and models." ICML. 2019
  • 4. 4 Contributions In the paper, The authors propose a classifier-consistent risk estimator for PLL theoretically. They show the classifier learned from partially labeled data converges to the optimal one learned from ordinarily labeled data. The authors also propose a model-, loss-, optimizer-agnostic method for PLL.
  • 5. 5 Ordinary Multi-class Classification Let 𝒳 ⊆ ℝ! be the instance space and 𝒴 = 1,2, … , 𝑐 be the label space. Let 𝑝 𝑥, 𝑦 be the underlying joint density of random variables 𝑋, 𝑌 ∈ 𝒳×𝒴. The goal is to learn a classifier 𝒈: 𝒳 → ℝ" that minimizes the estimator of risk: ℛ 𝒈 = 𝔼 #,% ~' (,) ℓ 𝒈 𝑋 , 𝑒% where 𝒆𝒴 = 𝒆+ : 𝑖 ∈ 𝒴 denotes the standard canonical vector.
  • 6. 6 Partial-Label Learning Let candidate label set 𝑆 be the power set of true label set . Therefore, we need to train a classifier with partially labeled examples 𝑋, 𝑆 . The PLL risk estimator is defined over 𝑝 𝑥, 𝑠 : ℛ,-- 𝒈 = 𝔼 #,. ~' (,/ ℓ,-- 𝒈 𝑋 , 𝑆 where ℓ,--: ℝ" ×𝒫 𝒴 → ℝ.
  • 7. 7 Classifier-Consistent Risk Estimator To make ℛ,-- 𝒈 estimable, an intuitive way is through a surrogate loss. The authors consider that only the true label contributes to retrieving the classifier: For that, they define the PLL loss as the minimal loss over the candidate label set: ℓ,-- 𝒈 𝑋 , 𝑆 = min +∈. ℓ 𝒈 𝑋 , 𝒆+ This leads to a new risk estimator: ℛ,-- 𝒈 = 𝔼 #,. ~' (,/ min +∈. ℓ 𝒈 𝑋 , 𝒆+
  • 8. 8 Lemmas The ambiguity degree is defined as 𝛾 = sup 1,2 ~3 4,5 ,6~3 /|(,) , 8 %∈𝒴, 8 %9% Pr H 𝑌 ∈ 𝑆 𝛾 is the maximum probability of a negative label H 𝑌 co-occurs with the true label 𝑌. The small ambiguity degree condition (𝛾 < 1) implies that except for the true label, no other labels will be 100% included in the candidate label set. Moreover, if ℓ is the CE or MSE loss, the ordinary optimal classifier 𝒈∗ satisfies 𝑔+ ∗ 𝑋 = 𝑝 𝑌 = 𝑖 𝑋 .
  • 9. 9 Connection Under the deterministic scenario, if the small ambiguity degree condition is satisfied, and CE or MSE loss is used, then, the PLL optimal classifier 𝒈𝑷𝑳𝑳 ∗ of ℛ,-- 𝒈 is equivalent to the ordinary optimal classifier 𝒈∗ of ℛ 𝒈 : 𝒈𝑷𝑳𝑳 ∗ = 𝒈∗
  • 10. 10 Estimation Error Bound Let L ℛ,-- be the empirical counterpart of ℛ,--, and M 𝑔,-- = argmin L ℛ,-- 𝒈 be the empirical risk classifier. Suppose 𝒢) be a class of real functions. Rademacher complexity of 𝒢) over 𝑝 𝑥 with sample size 𝑛 is defined as ℜ= 𝒢) . Then, for any 𝛿 > 0, we have with probability as least 1 − 𝛿, ℛ,-- M 𝑔,-- − ℛ,-- 𝒈𝑷𝑳𝑳 ∗ ≤ 4 2𝑐𝐿ℓ Y )?@ " ℜ= 𝒢) + 2𝑀 log 2 𝛿 2𝑛 Therefore, ℛ,-- M 𝑔,-- → ℛ,-- 𝒈𝑷𝑳𝑳 ∗ as the number of training data 𝑛 → ∞.
  • 11. 11 Proposed Method However, the min operator in ℓ,-- 𝒈 𝑋 , 𝑆 makes optimization difficult, because if a wrong label 𝑖 is selected in the beginning, the optimization will focus on the wrong label till the end. They first require that ℓ can be decomposed into each label ℓ 𝒈 𝑋 , 𝒆% = Y +?@ " ℓ 𝑔+ 𝑋 , 𝑒+ % Then, the authors relax the min operator by the dynamic weights. L ℛ,-- = 1 𝑛 Y +?@ = Y A?@ " 𝑤+,Aℓ 𝑔A 𝑥+ , 𝑒A /! where 𝑒A /! is the 𝑗-th coordinate of 𝒆/! and 𝒆/! = ∑B∈/! 𝒆B
  • 12. 12 Proposed Method Ideally, the label with weight 1 is exactly the true label and 0 otherwise. Since the weights are latent, the minimizer of L ℛ,-- cannot be solved directly. Inspiring by the EM algorithm, the authors put more weights on more possible labels: 𝑤+,A = b 𝑔A 𝑥+ ∑B∈/! 𝑔B 𝑥+ , 𝑗 ∈ 𝑠+ 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 If the small ambiguity degree condition is satisfied, models tend to remember the true labels in the initial epochs, which guides the model towards a discriminative classifier giving relatively low losses for more possible true labels.
  • 13. 13 Proposed Method While they follow the EM algorithm, they merge the E-step and M-step. The weights can be updated at any epoch such that the local convergence within each epoch is not necessary. Therefore, they gets rid of the overfitting issues of EM methods.
  • 14. 14 Datasets The authors used widely used benchmark datasets, MNIST, Fashion-MNIST, Kuzushiji-MNIST, and CIFAR10 And five small datasets from UCI, Yeast, Texture, Dermatology, Synthetic Control, and 20Newgroups. They randomly flipped the negative label to positive label with probability 𝑞. Moreover, they used real-world partial-label datasets, Lost, Birdsong, MSRCv2, Soccer Player, and Yahoo! News.
  • 15. 15 Baselines They compared the proposed method (PRODEN) with: • PRODEN-itera: update the label weights every 100 epoch • PRODEN-sudden: update weights 𝑤+,B = 1 if argmaxA∈/! 𝑔A(𝑥+) and 0 otherwise • PRODEN-naïve: never update the weights but use uniform weights • PN-oracle: train a model with ordinary labels • PN-decomp: decompose one instance with multiple candidate labels into many instances each one single label • D2CNN: a PLL method based on DNN • GA: a CLL method based on DNN
  • 16. 16 Results on Benchmark Datasets When 𝑞 = 0.1, PRODEN is always the best method and comparable to PN-oracle. The performance of PRODEN-itera deteriorates drastically with complex models because of the overfitting issues.
  • 17. 17 Results on Benchmark Datasets When 𝑞 = 0.7, PRODEN is still comparable to PN-oracle. The superiority always stands out for PRODEN compared with D2CNN and GA.
  • 18. 18 Analysis on the Ambiguity Degree They also gradually move 𝑞 from 0.5 to 0.9 to simulate 𝛾(𝛾 → 𝑞 as 𝑛 → ∞). PRODEN tends to be less affected with increased ambiguity.
  • 19. 19 Results on Real-world Datasets They compare the proposed method with classical PLL methods, SURE, CLPL, ECOC, PLSVM, PLkNN, and IPAL which can hardly be implemented by DNNs on real-world and small-scale datasets.