This was presented Meta learned Confidence for Few-shot Learning on CVPR in 2020.
Few-shot learning is an important challenge under data scarcity.
When there is a lot of unlabeled data and data scarcity,
a) leveraging nearest neighbor graph
b) using predicted soft or hard labels on unlabeled samples to update the class prototype.
the model confidence may be unreliable, which may lead to incorrect predictions.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Meta learned Confidence for Few-shot Learning
1. Meta-Learned Confidence for
Few-shot Learning
Seong Min Kye1 , Hae Beom Lee1 , Hoirin Kim1 , Sung Ju Hwang1,2
1KAIST, 2AITRICS, South Korea
Computer Vision and Pattern Recognition (CVPR(2020))
This slide is made by Minha Kim (kimminha@g.skku.edu)
2. Introduction
Few-shot learning is an important challenge under data scarcity.
When there is a lot of unlabeled data and data scarcity,
a) leveraging nearest neighbor graph
b) using predicted soft or hard labels on unlabeled samples to update the class
prototype.
the model confidence may be unreliable, which may lead to incorrect predictions.
They introduce novel confidence-based transductive inference scheme
for metric-based meta-learning models.
Intoduction Overview Loss Function Ablation Study Contribution
4. Meta Learning?
Support Set Query Set
Task 1
Task 2
Meta Learning
Meta Test
the data similarity between the support set and the query set
allows the model to derive learning pattern
Intoduction Overview Loss Function Ablation Study Contribution
5. Overview - Model and data perturbation
< Data perturbation >
1. apply Horizontal flipped to ‘Support set’
2. apply Horizontal flipped
+ shifting
+ RandAugment
+ CutOut
to ‘Query set’
Data Perturbation allows to achieve the same effect
as a regularization without an explicit consistency loss
Intoduction Overview Loss Function Ablation Study Contribution
6. < Model >
1. generated by dropping a block (perturbation)
2. no perturbation block
the meta-learned confidence can better account for
uncertainties at unseen tasks.
Intoduction Overview Loss Function Ablation Study Contribution
8. Overview
2.
Confidence score using the ‘Soft-k means’
: embedding function
D : layer dropped
A : image applied Horizontal flip
Intoduction Loss Function Ablation Study Contribution
9. 3.
Updating Prototype
: embedding function
D : layer dropped
A : image applied Horizontal flip
Intoduction Overview Loss Function Ablation Study Contribution
13. Conclusion
• we proposed to takle them by meta-learning confidence scores, such that the
prototypes updated with meta-learned scores optimize for the transductive
inference performance.
• proposed to meta-learn the parameter of the length-scaling function, such
that the proper distance metric for the confidence scores can be automatically
determined.
• To enhance the quality of confidence scores, we suggest a consistency
regularization for data and embedding
• Validateion our transductive inference model on four benchmark datasets and
get state-of-the-art performances on both transductive and semi-supervised
few-shot classification tasks.
Overview
Intoduction Loss Function Ablation Study Contribution
Before I explain about details, there are introduction and backgraund.
Few-shot learning, the problem of learning under data scarcity, is an important challenge in deep
learning as large number of training instances may not be available in many real-world settings,
If there are not only data scarcity but a lot of unlabeled data for solve this problem,
usually the nearest neighbor graph method or predied soft or hard labels on unlabeled samples is used
2.
When there is a lot of unlabeled data and data scarcity,
Popular approach for these problem includes leveraging nearest neighbor graph or using predicted soft or hard labels on unlabeled samples to update the class prototype.
3..A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples, or confidence-weighted average of all the query samples. However, a caveat here is that the model confidence may be unreliable, which may lead to incorrect predictions.
aim)
To tackle this issue, we propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries such that they improve the model’s transductive inference performance on unseen tasks
befor explain about this paper,
I’ll explain about what difference is transductive inference and inductive inference.
And then I’ll explain about meta learning.
transduction or Transductive inference is to infer certain test data from certain observed data.
in contrasts, inductive inference is to predict the test data from trained model using a lot of train data. you can think of the classifier that generally know.
This difference can be applied if the prediction of model cannot be obtained by any inductive model.
for example, When input the test set to model after train the classifier both dog and cat , It is inductive inference to learn from the model so that the test set can be well predicted.
In other words, If train data is too small to be leaned or a lot of train data is unlabeled, You can use transductive inference.
So, Transductive inferenceis one of semi-supervised learning.
meta learning is In each task, the data similarity between the support set and the query set allows the model to derive learning pattern on its own to improve generalized performance.
In this work, 이런 Unlabeled 데이터 자체를 메타러닝을 위한 쿼리셋으로 사용하고,
They propose a novel confidence – based transductive inference scheme for metric-based meta-learning models
this is overview of Meta-llled confidence for Few-shot Learning.
First, They approach to data perturbation and model perturbation.
To further enhance the reliability of the learned confidence,
they introduce various types of model and data perturbations during meta-learning
First, they apply various augmentations to disjoint sets rather than to the same instance, which allows to achieve the same effect as a regularization without an explicit consistency loss
Also they consider two confidence scores, one from the full network and the other from a sub-network generated by dropping a block.
this approachs able to be the meta-learned confidence can better account for uncertainties at unseen tasks.
And then, After inputting Data perturbation to embedding function
데이터 pertur과 without pertur을 embedding function을 구축한 후에,
유클리디안 distance metric을 구한다.
distance metric define as Euclidean distance with normalization and instance-wise metric scaling gi or pair-wise metric scaling gp
both scaling is used in semi-supervising learning for correctly assigning confidence scores to unlabeled data.
and then, we calculate the confidence scord and prototype.
This step is to modify the protocol cluster using the Support set and the unlabeled query set. At this time, soft k-mean clustering is used that can be differential.
2 equation is confidence score which is obtained soft k means.
in other word, It is probability of it beloging to each class c.
and then finally,
using the without perturbation, perturbation and confidence score equation 2,
we can get updated prototype from 3 equation.
and then finally,
using the without perturbation, perturbation and confidence score equation 2,
we can get updated prototype from 3 equation.
this Algorithm1 is what I've explained so far, it's as follows.
in summury,
This table is few shot classification performance of miniImageNet and tiredImageNet
both are used for meta learning
As you can see, the top rows of this table show the accuracy of MCI and the existing inductive inference methods for few-shot classification.
Also the bottom rows of this table show the accuracy of MTC and the transductive inference methods
First, MCI is defined the meta confidence induction as an proposed metric with consistency regularization only.
Also Second result is about transductive inference.
MCT that is meta confidence transduction which performs transductive inference with the meta-learned confidence.
both are gain achieve new state-of-the-art results on all the datasets, with particularly good performance on one-shot classification.
또한 data per, model per 두 개 모두 진행하였을때 performance가 가장 높게 나왔다.