SlideShare a Scribd company logo
1 of 30
Download to read offline
김 성 철
Contents
1. 용어 정리
2. Semi-supervised learning
3. FixMatch
4. Related work
5. Experiments
2
용어 정리
3
• Supervised Learning & Unsupervised Learning
• Weak-supervised Learning
• Weak label (e.g. dog, cat, …)을 가지고 strong label (e.g. segmentation mask)을 예측하자!
• FickleNet, … (following을 안하고 있어서…)
• Semi-supervised Learning
• Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자!
• UDA, MixMatch, ReMixMatch, FixMatch, …
• Self-supervised Learning
• Unlabeled data를 이용해 데이터 고유의 특징을 뽑아내는 pretrained weight를 잘 만들어보자!
• solving jigsaw puzzle, CPC, BERT, GPT, MoCo, SimCLR, BYOL, …
Semi-supervised learning
4
• 목표
• Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자!
https://steemit.com/kr/@jiwoopapa/realistic-evaluation-of-semi-supervised-learning-algorithms
Semi-supervised learning
5
20202017 2018 2019
Entropy
Minimization
(2005)
Consistency
Regularization
Generic
Regularization
Self-Training
VTA
(MixUp) MixMatch
Mean TeacherΠ-Model
ReMixMatch
FixMatch
UDA
Pseudo Labeling (2013)
Noise
Student
Semi-supervised learning
6
• Entropy Minimization
• 예측값(softmax)의 confidenc를 높이기 위해 사용
• 주로 softmax temperature를 이용
• Temperature를 1보다 적게 설정할수록 entropy가 작아짐
https://papers.nips.cc/paper/2740-semi-supervised-learning-by-entropy-minimization.pdf
http://dsba.korea.ac.kr/seminar/?mod=document&uid=68
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
7
• Consistency Regularization
1. 모델을 이용해 unlabeled data의 분포 예측
2. Unlabeled data에 noise 추가 (data augmentation)
3. 예측한 분포를 augmented data의 정답 label로 사용해 모델로 학습
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
8
• Unsupervised Data Augmentation (UDA, 2019)
https://arxiv.org/abs/1904.12848
Semi-supervised learning
9
• MixMatch (NeurIPS 2019)
https://arxiv.org/abs/1905.02249
http://dsba.korea.ac.kr/seminar/?mod=document&uid=68
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
10
• ReMixMatch (ICLR 2020)
https://arxiv.org/abs/1911.09785
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
11
• FixMatch (NeurIPS 2020)
https://arxiv.org/abs/2001.07685
12
• Background
• Consistency regularization
• Pseudo-labeling
• Unlabeled data에 artificial label을 얻기 위해 모델 사용
• Argmax로 hard label을 만들고 유지
1
𝜇𝐵
෍
𝑏=1
𝜇𝐵
𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑞 𝑏
• 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝑢 𝑏 , ො𝑞 𝑏 = argmax 𝑞 𝑏
• 𝜏 : threshold hyperparameter
• Hard label을 만드는 것은 entropy minimization과 깊은 관련이 있음
FixMatch
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
13
• FixMatch
• Notation
• 𝒳 = 𝑥 𝑏, 𝑝 𝑏 : 𝑏 ∈ 1, … , 𝐵 : a batch of 𝐵 labeled examples
• 𝑥 𝑏, 𝑝 𝑏 : the training examples, one-hot labels
• 𝒰 = 𝑢 𝑏: 𝑏 ∈ 1, … , 𝜇𝐵 : a batch of 𝜇𝐵 unlabeled examples
• 𝜇 : a hyperparameter that determines the relative sizes of 𝒳 and 𝒰
• 𝑝 𝑚 𝑦|𝑥 : the predicted class distribution produced by the model for input 𝑥
• 𝒜 ⋅ , 𝛼 ⋅ : strong and weak augmentation
FixMatch
14
• FixMatch
• Two cross-entropy loss term
• Labeled data에 적용되는 supervised loss 𝑙 𝑠
𝑙 𝑠 =
1
𝐵
෍
𝑏=1
𝐵
H 𝑝 𝑏, 𝑝 𝑚 𝑦|𝛼 𝑥 𝑏
• Unsupervised loss 𝑙 𝑢
• Artificial label을 각 example에 대해 만들고 standard cross-entropy loss에 사용
• 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝛼 𝑢 𝑏 : weakly-augmented unlabeled image에 대한 model의 predicted class distribution
• ො𝑞 𝑏 = argmax 𝑞 𝑏 : pseudo-label로 사용. Strongly-augmented image와 cross-entropy loss
𝑙 𝑢 =
1
𝜇𝐵
෍
𝑏=1
𝜇𝐵
𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑝 𝑚 𝑦|𝒜 𝑢 𝑏
FixMatch
15
• FixMatch
• 𝑙 𝑠 + 𝜆 𝑢 𝑙 𝑢를 최소화!
• 𝜆 𝑢를 키우는 것이 최근 SSL algorithm의 경향이지만 FixMatch에서는 불필요
• 학습 초기에는 max 𝑞 𝑏 가 𝜏보다 작고, 학습이 진행되면서 더 confident되고 max 𝑞 𝑏 > 𝜏인 경우가 잦아짐
FixMatch
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
16
• Augmentation
• Two kinds of augmentation
• Weak
• Standard flip-and-shift augmentation
• Randomly horizontally flipping with 50%
• Randomly translating with up to 12.5% vertically and horizontally
• Strong
• AutoAugment
• RandAugment
• CTAugment (Control Theory Augment, in ReMixMatch)
+ Cutout
FixMatch
17
• Additional important factors
• Semi-supervised learning의 성능은 알고리즘 이외에 regularization의 정도 등과 같은
다른 요소의 영향을 상당부분 받을 수 있음 (labeled data가 적은 경우에 특히!)
• Regularization : simple weight decay
• Optimizer : standard SGD with momentum
• Learning rate scheduler : cosine learning rate decay ( 𝜂 cos
7𝜋𝑘
16𝐾
)
• Exponential moving average of model parameters
FixMatch
18
Related work
Experiments
• Dataset
• CIFAR-10
• CIFAR-100
• SVHN
• STL-10
• 96x96 pixels, 10 classes
• 5,000 labeled images
• 100,000 unlabeled images
• ImageNet
19
CIFAR-10 SVHN (Street View House Number)
STL-10https://www.cs.toronto.edu/~kriz/cifar.html
http://ufldl.stanford.edu/housenumbers/
https://cs.stanford.edu/~acoates/stl10/
Experiments
• CIFAR-10, CIFAR-100, SVHN
• Wide ResNet-28-2
• 5 different “folds” of labeled data
• CIFAR-100에서 ReMixMatch보다 낮은 성능
• Distribution Alignment가 큰 영향을 미치는 것으로 보임
→ FixMatch + Distribution Alignment : 40.14% error rate with 400 labeled examples!
20
Experiments
• STL-10
• Wide ResNet-37-2
• Five of the predefined folds of 1,000 labeled images each
21
Experiments
• ImageNet
• ResNet-50 & RandAugment
• Like UDA, 10% labeled data + 나머지 unlabeled data
• Top-1 error rate
• FixMatch : 28.54±0.52% (UDA보다 2.68% 낮음)
• Top-5 error rate : 10.87±0.28%
• S4L : supervised → pseudo-label re-training (semi) → supervised fine-tuning (self)
→ pseudo-label re-training에서는 좋은 성능
→ 모든 과정을 통과했을 때 거의 유사한 성능을 얻을 수 있었음
22
https://arxiv.org/abs/1905.03670
Experiments
• Barely Supervised Learning
• CIFAR-10에서 class당 하나의 example만 사용해서 진행
• 4개의 데이터셋 + 각 데이터셋에 대해 4번 학습
• 48.58~85.32% (median of 64.28%) test accuracy
• 첫 번째 데이터셋으로 학습한 4개의 모델은 모두 61~67%의 accuracy
• 두 번째 데이터셋으로 학습한 4개의 모델은 모두 68~75%의 accuracy
• Low-quality example을 선택한 경우 모델이 특정 class를 학습하는데 더 어렵지 않을까?
• 8개의 training dataset을 구축 (prototypicality)
• 여러 CIFAR-10 모델로 데이터를 representative한 순으로 sorting
• 8개의 bucket으로 나누고 bucket별로 8개의 one labeled example per class dataset 구축
→ 가장 representative한 데이터는 첫 번째, outlier들은 마지막
• 가장 prototypical한 example은 78%, middle of the distribution은 65%, outlier들은 10% accuracy
23
Experiments
• Barely Supervised Learning
24
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Sharpening and Thresholding
• Temperature 𝑇와 confidence threshold 𝜏에 대해 실험 진행
• Confidence threshold 𝜏 = 0.95
• 𝜏가 적용되었다면 temperature 𝑇는 큰 영향을 미치지 않음
25
0.95
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Augmentation Strategy
• Label guessing (pseudo-label)을 strong augmentation으로 교체
• 학습 초기에 발산 → pseudo-label은 weakly augmented data로 생성되어야함
• Model’s prediction에 weak augmentation 적용
• 45% accuracy에 도달했지만, 학습이 안정하지 않고 점차 12%로 떨어짐
→ model prediction에는 strong data augmentation이 필요
• 이런 결과는 supervised learning에서도 확인됨
26
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Ratio of Unlabeled Data
• Unlabeled data의 비율에 따른 성능 비교
• Unlabeled data를 많이 사용할수록 성능이 좋아짐 → UDA와 동일
• Learning rate 𝜂를 batch size에 따라 linear하게 변화를 주면 좋음 (특히 𝜂가 작을 때!)
27
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Optimizer and Learning Rate Schedule
• 이전 SSL 연구들에서는 실험이 거의 진행되지 않은 optimizers, hyperparameters
• SGD with momentum of 0.9가 제일 좋음 (4.84%)
• Momentum이 없을 때 5.19%
• Nesterov variant of momentum은 5% 아래의 error를 얻는데 필요X
• Adam은 별로
• Cosine learning rate decay가 많이 사용되는데,
FixMatch에서는 linear learning rate decay도 나쁘지 않음
• Cosine learning rate decay는 적절한 decaying rate를 고르는 것이 중요
• No decay는 0.86% accuracy가 감소
28
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Weight Decay
• Labeled data가 적은 경우에 weight decay가 굉장히 중요함
• Optimal보다 크거나 작은 값을 선택하면 10%이상의 성능차가 나타날 수 있음
29
감 사 합 니 다
30

More Related Content

What's hot

007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
[DL輪読会]Unsupervised Learning by Predicting Noise
[DL輪読会]Unsupervised Learning by Predicting Noise[DL輪読会]Unsupervised Learning by Predicting Noise
[DL輪読会]Unsupervised Learning by Predicting NoiseDeep Learning JP
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Saurab Dulal
 
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...Deep Learning JP
 
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladSuperpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladShima Foolad
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionMartinHogg9
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
はじパタ8章 svm
はじパタ8章 svmはじパタ8章 svm
はじパタ8章 svmtetsuro ito
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingUnsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingSungchul Kim
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationKuppusamy P
 
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Quantum Support Vector Machine
Quantum Support Vector MachineQuantum Support Vector Machine
Quantum Support Vector MachineYuma Nakamura
 
パターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムパターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムMiyoshi Yuya
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 

What's hot (20)

K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
[DL輪読会]Unsupervised Learning by Predicting Noise
[DL輪読会]Unsupervised Learning by Predicting Noise[DL輪読会]Unsupervised Learning by Predicting Noise
[DL輪読会]Unsupervised Learning by Predicting Noise
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladSuperpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
はじパタ8章 svm
はじパタ8章 svmはじパタ8章 svm
はじパタ8章 svm
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingUnsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency Training
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classification
 
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Quantum Support Vector Machine
Quantum Support Vector MachineQuantum Support Vector Machine
Quantum Support Vector Machine
 
パターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムパターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズム
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 

Similar to FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝Haesun Park
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning성재 최
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiionSubin An
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)SANG WON PARK
 
SQL performance and UDF
SQL performance and UDFSQL performance and UDF
SQL performance and UDFJAEGEUN YU
 
3.neural networks
3.neural networks3.neural networks
3.neural networksHaesun Park
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
Learning by association
Learning by associationLearning by association
Learning by association홍배 김
 
분석과 설계
분석과 설계분석과 설계
분석과 설계Haeil Yi
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term ReviewMario Cho
 
2017 빅콘테스트
2017 빅콘테스트2017 빅콘테스트
2017 빅콘테스트Sanghyun Kim
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basicsJinho Lee
 

Similar to FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence (20)

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiion
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)
 
SQL performance and UDF
SQL performance and UDFSQL performance and UDF
SQL performance and UDF
 
3.neural networks
3.neural networks3.neural networks
3.neural networks
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
Openface
OpenfaceOpenface
Openface
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
Learning by association
Learning by associationLearning by association
Learning by association
 
분석과 설계
분석과 설계분석과 설계
분석과 설계
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term Review
 
2017 빅콘테스트
2017 빅콘테스트2017 빅콘테스트
2017 빅콘테스트
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
DL from scratch(6)
DL from scratch(6)DL from scratch(6)
DL from scratch(6)
 

More from Sungchul Kim

Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksSungchul Kim
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationSungchul Kim
 

More from Sungchul Kim (18)

Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge Distillation
 

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

  • 2. Contents 1. 용어 정리 2. Semi-supervised learning 3. FixMatch 4. Related work 5. Experiments 2
  • 3. 용어 정리 3 • Supervised Learning & Unsupervised Learning • Weak-supervised Learning • Weak label (e.g. dog, cat, …)을 가지고 strong label (e.g. segmentation mask)을 예측하자! • FickleNet, … (following을 안하고 있어서…) • Semi-supervised Learning • Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자! • UDA, MixMatch, ReMixMatch, FixMatch, … • Self-supervised Learning • Unlabeled data를 이용해 데이터 고유의 특징을 뽑아내는 pretrained weight를 잘 만들어보자! • solving jigsaw puzzle, CPC, BERT, GPT, MoCo, SimCLR, BYOL, …
  • 4. Semi-supervised learning 4 • 목표 • Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자! https://steemit.com/kr/@jiwoopapa/realistic-evaluation-of-semi-supervised-learning-algorithms
  • 5. Semi-supervised learning 5 20202017 2018 2019 Entropy Minimization (2005) Consistency Regularization Generic Regularization Self-Training VTA (MixUp) MixMatch Mean TeacherΠ-Model ReMixMatch FixMatch UDA Pseudo Labeling (2013) Noise Student
  • 6. Semi-supervised learning 6 • Entropy Minimization • 예측값(softmax)의 confidenc를 높이기 위해 사용 • 주로 softmax temperature를 이용 • Temperature를 1보다 적게 설정할수록 entropy가 작아짐 https://papers.nips.cc/paper/2740-semi-supervised-learning-by-entropy-minimization.pdf http://dsba.korea.ac.kr/seminar/?mod=document&uid=68 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 7. Semi-supervised learning 7 • Consistency Regularization 1. 모델을 이용해 unlabeled data의 분포 예측 2. Unlabeled data에 noise 추가 (data augmentation) 3. 예측한 분포를 augmented data의 정답 label로 사용해 모델로 학습 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 8. Semi-supervised learning 8 • Unsupervised Data Augmentation (UDA, 2019) https://arxiv.org/abs/1904.12848
  • 9. Semi-supervised learning 9 • MixMatch (NeurIPS 2019) https://arxiv.org/abs/1905.02249 http://dsba.korea.ac.kr/seminar/?mod=document&uid=68 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 10. Semi-supervised learning 10 • ReMixMatch (ICLR 2020) https://arxiv.org/abs/1911.09785 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 11. Semi-supervised learning 11 • FixMatch (NeurIPS 2020) https://arxiv.org/abs/2001.07685
  • 12. 12 • Background • Consistency regularization • Pseudo-labeling • Unlabeled data에 artificial label을 얻기 위해 모델 사용 • Argmax로 hard label을 만들고 유지 1 𝜇𝐵 ෍ 𝑏=1 𝜇𝐵 𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑞 𝑏 • 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝑢 𝑏 , ො𝑞 𝑏 = argmax 𝑞 𝑏 • 𝜏 : threshold hyperparameter • Hard label을 만드는 것은 entropy minimization과 깊은 관련이 있음 FixMatch http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 13. 13 • FixMatch • Notation • 𝒳 = 𝑥 𝑏, 𝑝 𝑏 : 𝑏 ∈ 1, … , 𝐵 : a batch of 𝐵 labeled examples • 𝑥 𝑏, 𝑝 𝑏 : the training examples, one-hot labels • 𝒰 = 𝑢 𝑏: 𝑏 ∈ 1, … , 𝜇𝐵 : a batch of 𝜇𝐵 unlabeled examples • 𝜇 : a hyperparameter that determines the relative sizes of 𝒳 and 𝒰 • 𝑝 𝑚 𝑦|𝑥 : the predicted class distribution produced by the model for input 𝑥 • 𝒜 ⋅ , 𝛼 ⋅ : strong and weak augmentation FixMatch
  • 14. 14 • FixMatch • Two cross-entropy loss term • Labeled data에 적용되는 supervised loss 𝑙 𝑠 𝑙 𝑠 = 1 𝐵 ෍ 𝑏=1 𝐵 H 𝑝 𝑏, 𝑝 𝑚 𝑦|𝛼 𝑥 𝑏 • Unsupervised loss 𝑙 𝑢 • Artificial label을 각 example에 대해 만들고 standard cross-entropy loss에 사용 • 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝛼 𝑢 𝑏 : weakly-augmented unlabeled image에 대한 model의 predicted class distribution • ො𝑞 𝑏 = argmax 𝑞 𝑏 : pseudo-label로 사용. Strongly-augmented image와 cross-entropy loss 𝑙 𝑢 = 1 𝜇𝐵 ෍ 𝑏=1 𝜇𝐵 𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑝 𝑚 𝑦|𝒜 𝑢 𝑏 FixMatch
  • 15. 15 • FixMatch • 𝑙 𝑠 + 𝜆 𝑢 𝑙 𝑢를 최소화! • 𝜆 𝑢를 키우는 것이 최근 SSL algorithm의 경향이지만 FixMatch에서는 불필요 • 학습 초기에는 max 𝑞 𝑏 가 𝜏보다 작고, 학습이 진행되면서 더 confident되고 max 𝑞 𝑏 > 𝜏인 경우가 잦아짐 FixMatch http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 16. 16 • Augmentation • Two kinds of augmentation • Weak • Standard flip-and-shift augmentation • Randomly horizontally flipping with 50% • Randomly translating with up to 12.5% vertically and horizontally • Strong • AutoAugment • RandAugment • CTAugment (Control Theory Augment, in ReMixMatch) + Cutout FixMatch
  • 17. 17 • Additional important factors • Semi-supervised learning의 성능은 알고리즘 이외에 regularization의 정도 등과 같은 다른 요소의 영향을 상당부분 받을 수 있음 (labeled data가 적은 경우에 특히!) • Regularization : simple weight decay • Optimizer : standard SGD with momentum • Learning rate scheduler : cosine learning rate decay ( 𝜂 cos 7𝜋𝑘 16𝐾 ) • Exponential moving average of model parameters FixMatch
  • 19. Experiments • Dataset • CIFAR-10 • CIFAR-100 • SVHN • STL-10 • 96x96 pixels, 10 classes • 5,000 labeled images • 100,000 unlabeled images • ImageNet 19 CIFAR-10 SVHN (Street View House Number) STL-10https://www.cs.toronto.edu/~kriz/cifar.html http://ufldl.stanford.edu/housenumbers/ https://cs.stanford.edu/~acoates/stl10/
  • 20. Experiments • CIFAR-10, CIFAR-100, SVHN • Wide ResNet-28-2 • 5 different “folds” of labeled data • CIFAR-100에서 ReMixMatch보다 낮은 성능 • Distribution Alignment가 큰 영향을 미치는 것으로 보임 → FixMatch + Distribution Alignment : 40.14% error rate with 400 labeled examples! 20
  • 21. Experiments • STL-10 • Wide ResNet-37-2 • Five of the predefined folds of 1,000 labeled images each 21
  • 22. Experiments • ImageNet • ResNet-50 & RandAugment • Like UDA, 10% labeled data + 나머지 unlabeled data • Top-1 error rate • FixMatch : 28.54±0.52% (UDA보다 2.68% 낮음) • Top-5 error rate : 10.87±0.28% • S4L : supervised → pseudo-label re-training (semi) → supervised fine-tuning (self) → pseudo-label re-training에서는 좋은 성능 → 모든 과정을 통과했을 때 거의 유사한 성능을 얻을 수 있었음 22 https://arxiv.org/abs/1905.03670
  • 23. Experiments • Barely Supervised Learning • CIFAR-10에서 class당 하나의 example만 사용해서 진행 • 4개의 데이터셋 + 각 데이터셋에 대해 4번 학습 • 48.58~85.32% (median of 64.28%) test accuracy • 첫 번째 데이터셋으로 학습한 4개의 모델은 모두 61~67%의 accuracy • 두 번째 데이터셋으로 학습한 4개의 모델은 모두 68~75%의 accuracy • Low-quality example을 선택한 경우 모델이 특정 class를 학습하는데 더 어렵지 않을까? • 8개의 training dataset을 구축 (prototypicality) • 여러 CIFAR-10 모델로 데이터를 representative한 순으로 sorting • 8개의 bucket으로 나누고 bucket별로 8개의 one labeled example per class dataset 구축 → 가장 representative한 데이터는 첫 번째, outlier들은 마지막 • 가장 prototypical한 example은 78%, middle of the distribution은 65%, outlier들은 10% accuracy 23
  • 25. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Sharpening and Thresholding • Temperature 𝑇와 confidence threshold 𝜏에 대해 실험 진행 • Confidence threshold 𝜏 = 0.95 • 𝜏가 적용되었다면 temperature 𝑇는 큰 영향을 미치지 않음 25 0.95
  • 26. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Augmentation Strategy • Label guessing (pseudo-label)을 strong augmentation으로 교체 • 학습 초기에 발산 → pseudo-label은 weakly augmented data로 생성되어야함 • Model’s prediction에 weak augmentation 적용 • 45% accuracy에 도달했지만, 학습이 안정하지 않고 점차 12%로 떨어짐 → model prediction에는 strong data augmentation이 필요 • 이런 결과는 supervised learning에서도 확인됨 26
  • 27. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Ratio of Unlabeled Data • Unlabeled data의 비율에 따른 성능 비교 • Unlabeled data를 많이 사용할수록 성능이 좋아짐 → UDA와 동일 • Learning rate 𝜂를 batch size에 따라 linear하게 변화를 주면 좋음 (특히 𝜂가 작을 때!) 27
  • 28. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Optimizer and Learning Rate Schedule • 이전 SSL 연구들에서는 실험이 거의 진행되지 않은 optimizers, hyperparameters • SGD with momentum of 0.9가 제일 좋음 (4.84%) • Momentum이 없을 때 5.19% • Nesterov variant of momentum은 5% 아래의 error를 얻는데 필요X • Adam은 별로 • Cosine learning rate decay가 많이 사용되는데, FixMatch에서는 linear learning rate decay도 나쁘지 않음 • Cosine learning rate decay는 적절한 decaying rate를 고르는 것이 중요 • No decay는 0.86% accuracy가 감소 28
  • 29. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Weight Decay • Labeled data가 적은 경우에 weight decay가 굉장히 중요함 • Optimal보다 크거나 작은 값을 선택하면 10%이상의 성능차가 나타날 수 있음 29
  • 30. 감 사 합 니 다 30