Adaptive Consistency Regularization for
Semi-Supervised Transfer Learning
Abuduweili et al. (CVPR 2021)
Dongmin Choi
Yonsei University Translational Artificial Intelligence Lab
Introduction
Semi-Supervised Learning (SSL)
• Effectively leveraging both labeled and unlabeled data
• Three main approaches:
1) consistency based regularization
2) entropy minimization
3) pseudo label
Introduction
Transfer Learning
• The powerful pre-trained model
1) excellent transferability
2) generalization capacity
• Zhou et al.
1) the benefit of SSL are smaller when trained from a pre-trained model
2) combining SSL and transfer learning can solve the domain gap
[Zhou et al, When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets, arXiv 2018]
Introduction
A Semi-Supervised Transfer Learning Framework
• Extend consistency regularization in SSL to adapt the
inductive transfer learning
• Two essential components:
1) Adaptive Knowledge Consistency (AKC)
- transfer knowledge from the pre-trained model
2) Adaptive Representation Consistency (ARC)
- utilize unlabeled examples to adjust the representation
Related Work
Domain Adaptation
• Tackle the sample selection bias btw the training and test data
• Generate domain invariant representation over the training set
• 내용 추가 필요
Related Work
Semi-Supervised Learning
• Consistency based regularization
- hypothesis : the decision boundary should not pass through high-
density areas
→ two close inputs are expected to have the same label
[Engelen et al, A survey on semi-supervised learning, Machine Learning 2020
Related Work
Semi-Supervised Learning
• П-model
[Laine, Temporal Ensembling for Semi-Supervised Learning, ICLR 2017
Targets can be noisy
prior network evaluations
Related Work
Semi-Supervised Learning
• Mean Teacher
[Tarvainen, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, NIPS 2017
Averages model weights instead of label predictions
Related Work
Semi-Supervised Learning
• FixMatch
[Sohn et al., FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence, NeurIPS 2020
Consistency regularization + Pseudo labeling
The Proposed Framework
The Proposed Framework
𝑫𝒕
𝒍
𝑫𝒕
𝒖
𝑭𝜽𝟎
𝑭𝜽
𝑮𝜽𝟎
𝑮𝜽
𝜃∗
, 𝜙∗
= arg min
𝜃,𝜙
∑𝑖=1
𝑛
𝐿CE 𝜃, 𝜙; 𝑥𝑙
𝑖
+ 𝑅 𝜃
The Proposed Framework
𝑅𝐾 =
1
𝐵𝑙 + 𝐵𝑢
෍
𝑥𝑖∈𝐿∪𝑈
𝑤K
𝑖
KL 𝐹𝜃0 𝑥𝑖
, 𝐹𝜃 𝑥𝑖
1. Adaptive Knowledge Consistency (AKC)
The Proposed Framework
𝑅𝐾 =
1
𝐵𝑙 + 𝐵𝑢
෍
𝑥𝑖∈𝐿∪𝑈
𝒘𝐊
𝒊
KL 𝐹𝜃0 𝑥𝑖
, 𝐹𝜃 𝑥𝑖
1. Adaptive Knowledge Consistency (AKC)
Sample importance 𝒘𝐊
𝒊
= 𝐈 𝐇 𝐩𝒔
𝒊
≤ 𝝐𝐊
- An entropy function H p𝑠
𝑖 = − ∑𝑗=1
𝐶𝑠
p𝑠,𝑗
𝑖
log p𝑠,𝑗
𝑖
- I : a hard entropy-gate function (calculated entropy → binary sample importance)
The Proposed Framework
2. Adaptive Representation Consistency (ARC)
Maximum Mean Discrepancies (MMD)
to measure the distance
(Let’s skip the details!)
The Proposed Framework
Summarization of the Framework
𝐿 𝜃, 𝜙 =
1
𝑛
෍
𝑖=1
𝑛
𝐿CE 𝜃, 𝜙; 𝑥𝑙
𝑖
+ 𝜆S𝐿S 𝑥𝑢
𝑖
+ 𝜆K𝑅K 𝑥𝑙
𝑖
, 𝑥𝑢
𝑖
+ 𝜆R𝐿R 𝑥𝑙
𝑖
, 𝑥𝑢
𝑖
1
2
3
4
1 2 3 4
Experiments
Results on CUB-200-2011
Experiments
Results on MURA
Experiments
Results on CIFAR-10
Experiments
Results on CIFAR-10
Experiments
The actual sample selected ratio in ARC and AKC
Near 0.9
- exclude hard samples
Experiments
In Fully Supervised Transfer Learning
Conclusion
Two regularization methods : AKC and ARC
• Competitive among S.O.T.A SSL methods
• Best performance among several baseline methods on various
transfer learning benchmarks
• Can be used for more general transfer learning and (semi-)
supervised learning frameworks
Thank you

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Learning