Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

Learning to Balance: Bayesian Meta-Learning
for Imbalanced and Out-of-distribution Tasks
Hae Beom Lee¹*, Hayeon Lee¹*, Donghyun Na²*,
Saehoon Kim³, Minseop Park³, Eunho Yang¹³, Sung Ju Hwang¹³
KAIST¹, TmaxData², AITRICS³

Few-shot Learning
Humans can generalize even with a single observation of a class.
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
Observation
Query examples
Human

Few-shot Learning
On the other hand, deep neural networks require large number of training instances
to generalize well, and overfits with few training instances.
Few-shot
learning
Observation
Deep Neural Networks
How can we learn a model that generalize well even with few training instances?
Human
Query examples
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011

Meta-Learning for few-shot classification
Humans generalize well because we never learn from scratch.
→ Learn a model that can generalize over a task distribution!
Few-shot Classification
Knowledge
Transfer !
Meta-training
Meta-test
Test
Test
Training Test
Training
Training
: meta-knowledge
[Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017

Model-Agnostic Meta-Learning
Model Agnostic Meta Learning (MAML) aims to find initial model parameter
that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Initial model
parameter
𝐷1
𝐷2 𝐷3

Model-Agnostic Meta-Learning
Model Agnostic Meta Learning (MAML) aims to find initial model parameter
that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Initial model
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific parameter
for a novel task
𝐷1
𝐷2 𝐷3
𝐷∗

…
Artificial Settings
1. Class-imbalance 3. Distributional shift
Mismatch !
Realistic settings
# instance
/ class
Challenge: Realistic Task Distribution
While existing works on meta-learning assume balanced task distributions, in realistic
settings, we need to account for data imbalances as well as distributional shift.
# classes
SVHN
CIFAR
2. Task-imbalance

Learning to Balance Class imbalance
Tiger
…
Head class
Imbalanced
gradient direction
Lion
Tail class Meta-Knowledge
(initial model parameter)
Target learning process

Learning to Balance Class imbalance
Tiger
…
Head class
Balanced
gradient direction
Lion
Tail class Meta-Knowledge
(initial model parameter)
Class-specific
gradient scaling

Train Test
? ?
Learning to Balance Task imbalance
Small task Large task
Meta-Knowledge
(initial parameter)
…
tiger lion
…
Train Test
? ?
tiger lion
: resort to the
meta-knowledge.
: utilize the task
information
Small task
Large task
Task-dependent
learning rate multiplier
(for each layer)

Learning to Balance Distributional Shift
Car Truck
Train Test
? ?
Train Test
? ?Tiger Lion
In-distribution task
Out-of-distribution task
Meta-Knowledge
(Initial model parameter)
Initial parameter
Modulation
(for each channel)
(vehicles..)
(animals..)
Weights :
Biases :

Learning to Balance
Meta-Knowledge
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
In-Dist.
Out-of-Dist.

Learning to Balance
Head class
Tail class
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
Meta-Knowledge
In-Dist.
Out-of-Dist.

Learning to Balance
Small task
Large task
Head class
Tail class
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
Meta-Knowledge
In-Dist.
Out-of-Dist.

Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017

Variational Inference
Inference
TrainTest
dependent only
on training dataset [1]
[1] Ravi and Beatson, Amortized Bayesian Meta-Learning, ICLR 2019
Generative Process
TrainTest
Variational
distribution
We cannot access to the test label at meta-testing time.
→ Variational distribution should not have dependency on the test set.

Meta-training and Meta-testing
Final meta-training objective:
Meta-testing with Monte-Carlo (MC) approximation:
MC approximation
(S=10)
Expected log likelihood Regularization
Evidence Lower
Bound (ELBO)

Inference Network
Statistics Pooling
• extracts various statistics from the training dataset.
…
mean
variance
Cardinality
Distributional shift
Diversity of the set elements
The size of the set
Set
How to build the inference network ?
→ should be able to recognize the imbalance and distributional shift in .

Inference Network
Global
: test
3x3 conv
3x3 conv
3x3 conv
3x3 conv
FC
Task-
specific
: train
…
…
Statistics pooling
mean var. cardinality
Hierarchical dataset encoding
• Class encoding encodes the statistics of the instances within each class.
• Task encoding encodes the statistics of the classes within each task.
Instance-wise statistics
…
…
Statistics pooling
mean var. cardinality
Class-wise statistics

Experimental Setup
Meta-training with Imbalance
(1-50 shot)
In-distribution Out-of-distribution
CUB
SVHNCIFAR-FS
Mini-ImageNet
Task Imbalance Class Imbalance
Meta-testing with Imbalance
&
Distributional shift
Small Task
…Class Class
…
Large Task
Class Class
…
Small
Class
Task
Large
Class
…
…
Class
Class…
Task

Realistic Any-shot Classification
Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD)
tasks.
Meta-training CIFAR-FS
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17

tasks.
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17
Prototypical Networks 73.24 42.91
Proto-MAML 71.80 40.16

tasks.
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17
Prototypical Networks 73.24 42.91
Proto-MAML 71.80 40.16
Bayesian TAML 75.15 51.87

Meta-training CIFAR-FS mini-ImageNet
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
m.-ImgNet
(ID)
CUB
(OOD)
MAML 71.55 45.17 66.64 65.77
Meta-SGD 72.71 46.45 69.95 65.94
MT-net 72.30 49.17 67.63 66.09
Prototypical Networks 73.24 42.91 69.11 60.80
Proto-MAML 71.80 40.16 68.96 61.77
Bayesian TAML 75.15 51.87 71.46 71.71
tasks.
5.6%

Multi-Dataset Experiment
Aircraft
VGG-Flower QuickDraw
Fashion-MNIST
Traffic Signs
Meta-training with Imbalance Meta-testing with Imbalance
Aircraft
VGG-Flower QuickDraw

Meta-training Aircraft, QuickDraw, VGG-Flower
Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Bayesian TAML also outperforms the baselines in this challenging heterogeneous task
distribution.

Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Prototypical Networks 50.63 72.31 65.52 49.93 64.26
Proto-MAML 51.15 69.84 65.24 53.93 63.72
distribution.

Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Prototypical Networks 50.63 72.31 65.52 49.93 64.26
Proto-MAML 51.15 69.84 65.24 53.93 63.72
Bayesian TAML 54.43 72.03 67.72 64.81 68.94
distribution.
8.5%

𝒛 𝜏 for Distributional Shift
Meta-training CIFAR-FS miniImageNet
Meta-test SVHN CUB
MAML 45.17 65.77
Meta-SGD 46.45 65.94
Bayesian z-TAML 52.29 69.11
Large task
Classification Performance (%)
TSNE visualization of 𝔼[𝒛 𝜏
]
Initial
parameter
𝒛-TAML: Meta-SGD + 𝒛 𝜏

𝝎 𝝉 for Class Imbalance
CIFAR-FS
Degree of class imbalance
None Medium High
MAML 73.60 71.15 67.43
Meta-SGD 73.25 72.68 71.61
Bayesian 𝝎-TAML 73.44 73.20 72.86
𝝎-TAML: Meta-SGD + 𝝎 𝜏

𝜸 𝜏 for Task Imbalance
Task size vs. Acc. Task size vs. 𝔼[𝜸 𝝉
]
𝜸-TAML: Meta-SGD + 𝜸 𝜏

Effectiveness of Bayesian Methods
We further found out that Bayesian framework is very effective for solving out-of-
distribution tasks.
• MAML + Bayesian → Ensemble, which seems effective for OOD tasks.
• Also, Bayesian framework amplifies the effect of the balancing variables.
Meta-training CIFAR-FS CIFAR-FS
Meta-test CIFAR-FS SVHN
MAML 70.19 41.81
Meta-SGD 72.71 46.45
Deterministic TAML 73.82 46.78
Bayesian TAML 75.15 51.87
Task size vs. 𝜸 𝜏
+1.3 +5.1

Effectiveness of Hierarchical Statistics Pooling
Finally, we evaluate the effectiveness of the hierarchical dataset encoding.
The result suggests that set cardinality and variance we utilized for hierarchical
encoding are more informative than simple mean-pooling methods.
CIFAR-FS
Hierarachical encoding
× √
Mean 73.84 73.69
Mean + N 73.17 74.88
Mean + Var. + N 73.93 75.15

Summary
• Existing work on meta-learning consider artificial settings where we assume the
same number of instances per task and class. However, in realistic scenarios, we
need to handle task/class imbalances and distributional shift.
• To this end, we propose to learn to balance the effect of task-specific learning by
introducing the three balancing variables.
• Bayesian framework seems very important for solving OOD tasks, and also amplifies
the effect of the balancing variables.
• The hierarchical set encoding effectively captures both the class-level and task-level
imbalances, as well as distributional shifts.

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

More Related Content

What's hot

Similar to Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

More from MLAI2

Recently uploaded

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks