BOIL: Towards Representation Change for Few-shot Learning

Meta-Learning for the representation
change
in task-specific update
Defence for the Master’s Thesis
2020. 12. 14
Hyungjun Yoo (Advisor : Se-Young Yun)
Graduate School of Knowledge Service Engineering, OSI Lab

Table of Contents
Purpose of the Research
Leverage representation change through freezing head and updating
body only in task-specific adaptation for efficient Meta-learning
1. Introduction : What is Meta-Learning?
2. Problem Setting : Few-shot Classification, MAML, ANIL
3. BOIL Algorithm : BOIL, Domain-agnostic adaptation
4. Representation Change in BOIL : Cosine similarity, CKA, Empirical Analysis,
Disconnection trick (ResNet)
5. Conclusion : Research contributions

Hyungjun Yoo Defense for the Master’s Thesis 3 / 46
14th December 2020
• Deep Neural Network needs a large labeled dataset and requires long training time.
What is Meta-learning?
Introduction
AlexNet
Residual Net
Inception V3

14th December 2020
• Deep Neural Network needs a large labeled dataset and requires long training time.
• But human can learn with only few samples, and also, human can use previous knowledge to learn new task.
Introduction
AlexNet
Residual Net
Inception V3

14th December 2020
Introduction
• Meta-Learning (Learning to learn) : tries to make DNN mimic human intelligence
• Model is able to learn with few shot examples in each task
• Model which learns to learn from the previous similar tasks can quickly learns new task
• Why learning to learn?
• Advantages to effectively reuse data on other tasks
• Applicability replace manual engineering of architecture, hyperparameters, etc.
• Learning property to quickly adapt to unexpected scenarios (inevitable failures, long tail)
• Problem Domains
• Few-shot classification

14th December 2020
Main Approaches of Meta-learning
Introduction
Metric Based Model Based Optimization Based
Key Idea Metric learning RNN, external memory Gradient Descent
Key Papers
Matching Net
(Vinyals et al., 2016)
Prototypical Net
(Snell et al., 2017)
MANN
(Santoro et al., 2016)
Meta Network
(Munkhdalai & Yu, 2017)
MAML
(Finn et al., 2017)
Reptile
(Nichol et al., 2018)
Strength
Simple
Entirely feedforward
Applicable to any baseline
model
Lends well to OOD tasks
Model Agnostic
Weakness
Hard to generalize to varying
example size
Restricted domain
Excessive computation and
parameters
Often data-inefficient
Training instability
Second-order optimization

14th December 2020
Few-shot learning (𝒏-way 𝒌-shot classification)
Problem Setting
Query Set, 𝐷𝜏1
𝑞𝑟𝑦
Support Set, 𝐷𝜏1
𝑠𝑝𝑡
𝑞𝑟𝑦
𝑠𝑝𝑡
, 𝝉𝟏 , 𝝉𝟐
. , 𝝉𝟏
.
𝑞𝑟𝑦
𝑠𝑝𝑡

14th December 2020
Meta-learning Framework (MAML)
• Model-Agnostic Meta-Learning (MAML, Finn et al., 2017)
• MAML uses 𝑏(meta-batch size) tasks for a single iteration, and each iteration consists of two steps, inner
loop and outer loop.
• Two steps of parameter update
• Inner loop : from meta-initialization(𝜃0), task-specifically updates using support set
Problem Setting
inner loop : 𝜃𝜏𝑖
= 𝜃0 − 𝜕∇𝜃0
𝐿𝜏𝑖
, 𝑓𝜃0
, 𝐷𝜏𝑖
𝑠𝑝𝑡

14th December 2020
Meta-learning Framework (MAML)
• Model-Agnostic Meta-Learning (MAML, Finn et al., 2017)
• MAML uses 𝑏(meta-batch size) tasks for a single iteration, and each iteration consists of two steps, inner
loop and outer loop.
• Two steps of parameter update
• Inner loop : from meta-initialization(𝜃0), task-specifically updates using support set
• Outer loop : updates meta-initialization with average of the task-specific losses using query set
Problem Setting
inner loop :
outer loop :
𝜃𝜏𝑖
= 𝜃0 − 𝜕∇𝜃0
𝐿𝜏𝑖
, 𝑓𝜃0
, 𝐷𝜏𝑖
𝑠𝑝𝑡
𝜃0 = 𝜃0 − 𝛽∇𝜃0
෍
𝜏𝑖~𝑝(𝑇)
𝜏𝑏
𝐿𝜏𝑖
, 𝑓𝜃𝜏𝑖
, 𝐷𝜏𝑖
𝑞𝑟𝑦

14th December 2020
Representation Reuse and Change
• Rapid Learning or Feature Reuse? (Raghu et al., 2020)
Problem Setting
• Divide the network with two parts : 𝜃 = 𝜃𝑒𝑥𝑡
, 𝜃𝑐𝑙𝑠
• body (𝜃𝑒𝑥𝑡
, extractor, conv layers)
• head (𝜃𝑐𝑙𝑠
, classifier, fully connected layer)

14th December 2020
Problem Setting
, 𝜃𝑐𝑙𝑠
• Representation Change / Representation Reuse
• Rapid Learning :
Representations from body are significantly
changed after inner update.
→ Representation Change
• Feature Reuse :
Representations from body are negligibly
changed after inner update and reused.
→ Representation reuse

14th December 2020
Problem Setting
, 𝜃𝑐𝑙𝑠
• The dominant factor of MAML’s effectiveness is
representation reuse.
• Meta-trained model’s body is already able to
extract good representations before inner update,
and representations are not changed after task-
specific update (inner loop).

14th December 2020
Problem Setting
, 𝜃𝑐𝑙𝑠
• The dominant factor of MAML’s effectiveness is
representation reuse.
• Suggest more computationally efficient update rule
• ANIL : Head only update in inner loop
• Inner loop updates :
𝜃𝜏𝑖
𝑒𝑥𝑡
= 𝜃0
𝑒𝑥𝑡
(No update in inner loop)
𝜃𝜏𝑖
𝑐𝑙𝑠
= 𝜃0
𝑐𝑙𝑠
− 𝜕ℎ∇𝜃0
𝐿𝜏𝑖
, 𝑓𝜃0
, 𝐷𝜏𝑖
𝑠𝑝𝑡
• Outer loop updates :
𝜃0 = 𝜃0 − 𝛽∇𝜃0
σ𝜏𝑖~𝑝(𝑇) 𝐿𝜏𝑖
, 𝑓𝜃𝜏𝑖
, 𝐷𝜏𝑖
𝑞𝑟𝑦

14th December 2020
• The reverse version of ANIL : In inner loop, classifier is fixed, and body is only updated
BOIL(Body Only update in Inner Loop) Algorithm
BOIL Algorithm
• Inner loop updates : 𝜃𝜏𝑖
𝑒𝑥𝑡
= 𝜃0
𝑒𝑥𝑡
− 𝜕𝑏∇𝜃0
𝐿𝜏𝑖
, 𝑓𝜃0
, 𝐷𝜏𝑖
𝑠𝑝𝑡
𝜃𝜏𝑖
𝑐𝑙𝑠
= 𝜃0
𝑐𝑙𝑠
(No update)
• Outer loop updates :
𝜃0 = 𝜃0 −
𝛽∇𝜃0
σ𝜏𝑖~𝑝(𝑇) 𝐿𝜏𝑖
, 𝑓𝜃𝜏𝑖
, 𝐷𝜏𝑖
𝑞𝑟𝑦
• The learning rates according to the algorithms
4conv network ResNet-12
MAML ANIL BOIL MAML ANIL BOIL
𝛼𝑏 0.5 0.0 0.5 0.3 0.0 0.3
𝛼ℎ 0.5 0.5 0.0 0.3 0.3 0.0
𝛽𝑏 0.001 0.001 0.001 0.0006 0.0006 0.0006
𝛽ℎ 0.001 0.001 0.001 0.0006 0.0006 0.0006

14th December 2020
• Representation change in BOIL through task-specific update
• Difference in task-specific (inner) updates between MAML/ANIL and BOIL
• (a) MAML / ANIL : mainly updates the head with a negligible change in body (extractor).
hence, representations on the feature space are almost identical.
BOIL Algorithm
(a) MAML/ANIL.
Decision boundaries
Representations
Inner loop

14th December 2020
• Representation change in BOIL through task-specific adaptation (inner loop update)
• Difference in task-specific (inner) updates between MAML/ANIL and BOIL
• (a) MAML / ANIL : mainly updates the head with a negligible change in body (extractor).
hence, representations on the feature space are almost identical.
• (b) BOIL : updates only the body without changing the head through inner updates.
Representations on the feature space change significantly following the fixed decision
boundaries.
BOIL Algorithm
(a) MAML/ANIL.
Decision boundaries
Representations
Inner loop
(b) BOIL.
Inner loop

14th December 2020
• The goal of meta-learning : Ability to adapt to environment where the source and target are even very
different
Necessity of Representation Change :
Domain-agnostic adaptation
BOIL Algorithm
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
mini-ImageNet
(test classes)
(a) Same Domain Adaptation (b) Cross Domain Adaptation (Domain-Agnostic)
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
CUB (Bird only)
(test classes)

14th December 2020
• The goal of meta-learning : Ability to adapt to environment where the source and target are significantly different
• When there are no strong similarities between the source and target domains, representation reuse using good
representations for the source domain could be imperfect representations for the target domain.
BOIL Algorithm
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
mini-ImageNet
(test classes)
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
CUB (Bird only)
(test classes)

14th December 2020
• The goal of meta-learning : Ability to adapt to environment where the source and target are significantly different
• When there are no strong similarities between the source and target domains, representation reuse using good
representations for the source domain could be imperfect representations for the target domain.
• Therefore, the ability to adapt well to other target domain, i.e. the ability to task-specifically update in response
to unseen tasks and change the representation(=representation change) during inner loop, is necessary.
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
mini-ImageNet
(test classes)
Source Domain :
mini-ImageNet
(training classes)
Target Domain :
CUB (Bird only)
(test classes)
BOIL Algorithm

14th December 2020
Superiority of BOIL in Domain-agnostic Adaptation
• We divide the types of datasets into General (mini-ImageNet, tiered-ImageNet) domain and Specific (Cars, CUB)
domain based on fineness.
BOIL Algorithm

14th December 2020
• Adaptation types : [General → General], [General → Specific], [Specific → General], [Specific → Specific]
(in order to make the situation realistic and make it more difficult)
BOIL Algorithm

14th December 2020
• Adaptation types : [General → General], [General → Specific], [Specific → General], [Specific → Specific]
(in order to make the situation realistic and make it more difficult)
• With all settings, BOIL overwhelms MAML/ANIL via representation change.
BOIL Algorithm

14th December 2020
Cosine Similarity of Representations of BOIL
Representation Change in BOIL
• Cosine Similarity : We calculate Cosine Similarity between the representations of a query set including 5
classes and 15 samples per class from mini-ImageNet after every convolution module.
• The orange line represents the average of the cosine similarity between the samples having the same class,
and the blue line represents the average of the cosine similarity between the samples having different classes.
(a) MAML (b) ANIL

14th December 2020
• MAML/ANIL :
• Their patterns do not show any noticeable difference before or after update.
→ The effectiveness of MAML/ANIL heavily leans on the meta-initialized body, not the task-specific
adaptation.
(representation reuse)
(a) MAML (b) ANIL

14th December 2020
• BOIL :
• Before adaptation, BOIL’s meta-initialized body cannot distinguish the classes.
• After adaptation, the similarity of the different classes rapidly decrease on conv4.
→ The body can distinguish the classes through adaptation. (representation change)
(c) BOIL

14th December 2020
CKA of Representations of BOIL
• CKA : We calculate CKA values of representations of query set before and after adaptation.
When the CKA between two representations is close to 1, the representations are almost identical.
(a) On mini-ImageNet dataset. (b) On Cars dataset.

14th December 2020
CKA of Representations of BOIL
• MAML/ANIL : CKA shows that the MAML/ANIL algorithms do not change the representation in the body.
• BOIL : BOIL changes the representation of the last conv layer.
This result indicates that the BOIL algorithm rapidly learns task through representation change.
(a) On mini-ImageNet dataset. (b) On Cars dataset.

14th December 2020
Empirical Analysis of BOIL
(a) Toy example of NIL-testing (3-way 5-shot).
• NIL-testing : In meta-testing, we make class-prototypes with support set and measure using similarity with
query set
in order to measure the effectiveness of representations after body.

14th December 2020
• NIL-testing : In meta-testing, we make class-prototypes with support set and measure using similarity with query set.
• With the head : Before adaptation, all algorithms on the same- and cross-domain are unable to distinguish all
classes (20%).

14th December 2020
classes (20%). After adaptation, BOIL overwhelms the performance of the other algorithms. This means that
representation change of BOIL is more effective than representation reuse of MAML/ANIL.

14th December 2020
• Without the head : MAML and ANIL already generate sufficient representations to classify, and adaptation
makes little or no difference.

14th December 2020
• Without the head : MAML and ANIL already generate sufficient representations to classify, and adaptation makes
little or no difference. BOIL shows a steep performance improvement through adaptation on the same- and
cross-domain. This result implies that the body of BOIL can be task-specifically updated.

14th December 2020
Ablation Study of Learning Layer
• Ablation study : We conduct experiments to train multiple consecutive layers with / without the head.

14th December 2020
• Ablation study : we conduct experiments to train multiple consecutive layers with / without the head.
• We consistently observe that learning with the head is far from the best accuracy. → Freezing head is crucial.

14th December 2020
• Ablation study : we conduct experiments to train multiple consecutive layers with / without the head.
• We consistently observe that learning with the head is far from the best accuracy. → Freezing head is crucial.
• We also find several settings skipping the lower-level layers in the inner loop that performs lightly better than
BOIL. We believe each neural network architecture and data set pair has its own best layer combination. When
it is allowed to search for the best combination using huge computing power, we can further improve BOIL.

14th December 2020
BOIL to the Larger Network (Residual Network)
• We explore BOIL’s applicability to a deeper network with the wiring structure, ResNet-12.
In general, deeper networks use feature wiring structures to facilitate the feature propagation, e.g. skip
connection.
(a) With the last skip connection.

14th December 2020
BOIL to the Larger Network (Residual Network)
• We explore BOIL’s applicability to a deeper network with the wiring structure, ResNet-12.
In general, deeper networks use feature wiring structures to facilitate the feature propagation, e.g. skip connection.
• Disconnection trick : We propose a simple trick by disconnecting the last skip connection in order to reinforce
the representation change in high level of body.
(a) With the last skip connection. (b) Without the last skip connection.
(Disconnection trick)

14th December 2020
Representation Change via Disconnection Trick
* LSC : last skip connection

14th December 2020
Main Contributions of Research
• We emphasize the necessity of representation change for meta-learning through cross-domain adaptation
experiments.
Conclusion

14th December 2020
experiments.
• We propose a simple but effective meta-learning algorithm that learns the Body (extractor)of the model
Only in the Inner Loop(BOIL). We empirically show that BOIL improves the performance over all benchmark
data sets and that this improvement is particularly noticeable in fine-grained data sets or cross-domain
adaptation.
Conclusion

14th December 2020
experiments.
• We propose a simple but effective meta-learning algorithm that learns the Body (extractor)of the model Only in
the Inner Loop(BOIL). We empirically show that BOIL improves the performance over all benchmark data sets
and that this improvement is particularly noticeable in fine-grained data sets or cross-domain adaptation.
• We demonstrate that the BOIL algorithm enjoys representation layer reuse on the low-/mid-level body and
representation layer change on the high-level body using the cosine similarity and CKA, and empirically
analyze the effectiveness of the body of BOIL through an ablation study on learning layers.
Conclusion

14th December 2020
experiments.
• We propose a simple but effective meta-learning algorithm that learns the Body (extractor)of the model Only in
the Inner Loop(BOIL). We empirically show that BOIL improves the performance over all benchmark data sets
and that this improvement is particularly noticeable in fine-grained data sets or cross-domain adaptation.
• We demonstrate that the BOIL algorithm enjoys representation layer reuse on the low-/mid-level body and
representation layer change on the high-level body using the cosine similarity and CKA, and empirically analyze
the effectiveness of the body of BOIL through an ablation study on learning layers.
• For ResNet architectures, we propose a disconnection trick that removes the back-propagation path of the last
skip connection. The disconnection trick strengthens representation layer change on the high-level body.
Conclusion

Reference
• Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation ofdeep
networks. In Proceedings of the 34th International Conference on Machine Learning-Volume70, pp. 1126–
1135. JMLR. org, 2017.
• Aniruddh Raghu, Maithra Raghu, Samy Bengio, and Oriol Vinyals. Rapid learning or feature reuse?
towards understanding the effectiveness of maml. In International Conference on Learning
Representations, 2020.
• Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot
learning. In Advances in neural information processing systems, pp. 3630–3638, 2016.
• Boris Oreshkin, Pau Rodríguez López, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for
improved few-shot learning. In Advances in Neural Information Processing Systems, pp.721–731, 2018.
• Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network
representations revisited. arXiv preprint arXiv:1905.00414, 2019.
• Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. Umap: Uniform manifold
approximation and projection. Journal of Open Source Software, 3(29), 2018.

14th December 2020
Model Implementations and Datasets
• Model Implementations
• 4conv network(Vinyals et al., 2016) : has 4 conv modules [3×3 conv layer(64 filters), BN, ReLU, 2×2 max-pool.]
• ResNet-12(Oreshkin et al., 2018) : has 4 residual blocks [3 conv modules, BN, Leaky ReLU]
• Datasets
Appendix
Datasets miniImageNet tieredImageNet Cars CUB
Source
Image size
Fineness
# meta-training classes
# meta-valication classes
# meta-testing classes
Split setting
Russakovsky et al. (2015)
84×84
Coarse-grain
64
16
20
Vinyals et al. (2016)
Russakovsky et al. (2015)
84×84
Coarse-grain
351
97
160
Ren et al. (2018)
Krause et al. (2013)
84×84
Fine-grain
98
49
49
Tseng et al. (2020)
Welinder et al. (2010)
84×84
Fine-grain
100
50
50
Hilliard et al. (2018)
Datasets FC100 CIFAR-FS VGG-Flower Aircraft
Source
Image size
Fineness
# meta-training classes
# meta-valication classes
# meta-testing classes
Split setting
Krizhevsky et al. (2009)
32×32
Coarse-grain
60
20
20
Bertinetto et al. (2018)
Krizhevsky et al. (2009)
32×32
Coarse-grain
64
16
20
Oreshkin et al. (2018)
Nilsback & Zisserman (2008)
32×32
Fine-grain
71
16
15
Na et al. (2019)
Maji et al. (2013)
32×32
Fine-grain
70
15
15
Na et al. (2019)

14th December 2020
• UMAP visualization results of Benchmark data sets (Training domain → Test domain)
Visualization with UMAP
Appendix
mini-ImageNet → mini-ImageNet CUB → Cars
tieredImageNet → miniImageNet

14th December 2020
Representation Change via Disconnection Trick
Appendix
(a) Cosine similarity in block4 (the last block) of ResNet-12, BOIL with LSC
(b) Cosine similarity in block4 (the last block) of ResNet-12, BOIL without LSC

BOIL: Towards Representation Change for Few-shot Learning

Recommended

Recommended

More Related Content

Similar to BOIL: Towards Representation Change for Few-shot Learning

Similar to BOIL: Towards Representation Change for Few-shot Learning (20)

Recently uploaded

Recently uploaded (20)

BOIL: Towards Representation Change for Few-shot Learning