Learning to Balance: Bayesian Meta-Learning
for Imbalanced and Out-of-distribution Tasks
Hae Beom Lee¹*, Hayeon Lee¹*, Donghyun Na²*,
Saehoon Kim³, Minseop Park³, Eunho Yang¹³, Sung Ju Hwang¹³
KAIST¹, TmaxData², AITRICS³
Few-shot Learning
Humans can generalize even with a single observation of a class.
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
Observation
Query examples
Human
Few-shot Learning
On the other hand, deep neural networks require large number of training instances
to generalize well, and overfits with few training instances.
Few-shot
learning
Observation
Deep Neural Networks
How can we learn a model that generalize well even with few training instances?
Human
Query examples
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
Meta-Learning for few-shot classification
Humans generalize well because we never learn from scratch.
→ Learn a model that can generalize over a task distribution!
Few-shot Classification
Knowledge
Transfer !
Meta-training
Meta-test
Test
Test
Training Test
Training
Training
: meta-knowledge
[Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017
Model-Agnostic Meta-Learning
Model Agnostic Meta Learning (MAML) aims to find initial model parameter
that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Initial model
parameter
𝐷1
𝐷2 𝐷3
Model-Agnostic Meta-Learning
Model Agnostic Meta Learning (MAML) aims to find initial model parameter
that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Initial model
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific parameter
for a novel task
𝐷1
𝐷2 𝐷3
𝐷∗
…
Artificial Settings
1. Class-imbalance 3. Distributional shift
Mismatch !
Realistic settings
# instance
/ class
Challenge: Realistic Task Distribution
While existing works on meta-learning assume balanced task distributions, in realistic
settings, we need to account for data imbalances as well as distributional shift.
# classes
SVHN
CIFAR
2. Task-imbalance
Learning to Balance Class imbalance
Tiger
…
Head class
Imbalanced
gradient direction
Lion
Tail class Meta-Knowledge
(initial model parameter)
Target learning process
Learning to Balance Class imbalance
Tiger
…
Head class
Balanced
gradient direction
Lion
Tail class Meta-Knowledge
(initial model parameter)
Target learning process
Class-specific
gradient scaling
Train Test
? ?
Learning to Balance Task imbalance
Small task Large task
Meta-Knowledge
(initial parameter)
…
tiger lion
…
Train Test
? ?
tiger lion
: resort to the
meta-knowledge.
: utilize the task
information
Target learning process
Small task
Large task
Task-dependent
learning rate multiplier
(for each layer)
Learning to Balance Distributional Shift
Car Truck
Train Test
? ?
Train Test
? ?Tiger Lion
In-distribution task
Out-of-distribution task
Meta-Knowledge
(Initial model parameter)
Initial parameter
Modulation
(for each channel)
Target learning process
(vehicles..)
(animals..)
Weights :
Biases :
Learning to Balance
Meta-Knowledge
(Initial model parameter)
Target learning process
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
In-Dist.
Out-of-Dist.
Learning to Balance
Target learning process
Head class
Tail class
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
Meta-Knowledge
(Initial model parameter)
In-Dist.
Out-of-Dist.
Learning to Balance
Small task
Large task
Target learning process
Head class
Tail class
𝜽∗
= 𝜽 ∗ 𝒛 𝜏
− 𝜸 𝜏
∘ 𝜶 ∘ ෍
𝑐=1
𝐶
𝜔𝑐
𝜏
∇ 𝜃ℒ 𝑐
tr
Meta-Knowledge
(Initial model parameter)
In-Dist.
Out-of-Dist.
Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
Bayesian TAML
Generative Process
Bayesian framework [1][2]
• Allows robust inference on the latent variables.
• In MAML → results in the ensemble of diverse task-specific predictors [3].
[1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018
[2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019
TrainTest
[3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
Variational Inference
Inference
TrainTest
dependent only
on training dataset [1]
[1] Ravi and Beatson, Amortized Bayesian Meta-Learning, ICLR 2019
Generative Process
TrainTest
Variational
distribution
We cannot access to the test label at meta-testing time.
→ Variational distribution should not have dependency on the test set.
Meta-training and Meta-testing
Final meta-training objective:
Meta-testing with Monte-Carlo (MC) approximation:
MC approximation
(S=10)
Expected log likelihood Regularization
Evidence Lower
Bound (ELBO)
Inference Network
Statistics Pooling
• extracts various statistics from the training dataset.
…
mean
variance
Cardinality
Distributional shift
Diversity of the set elements
The size of the set
Set
How to build the inference network ?
→ should be able to recognize the imbalance and distributional shift in .
Inference Network
Global
: test
3x3 conv
3x3 conv
3x3 conv
3x3 conv
FC
Task-
specific
: train
…
…
Statistics pooling
mean var. cardinality
Hierarchical dataset encoding
• Class encoding encodes the statistics of the instances within each class.
• Task encoding encodes the statistics of the classes within each task.
Instance-wise statistics
…
…
Statistics pooling
mean var. cardinality
Class-wise statistics
Experimental Setup
Meta-training with Imbalance
(1-50 shot)
In-distribution Out-of-distribution
CUB
SVHNCIFAR-FS
Mini-ImageNet
Task Imbalance Class Imbalance
Meta-testing with Imbalance
&
Distributional shift
Small Task
…Class Class
…
Large Task
Class Class
…
Small
Class
Task
Large
Class
…
…
Class
Class…
Task
Realistic Any-shot Classification
Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD)
tasks.
Meta-training CIFAR-FS
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17
Realistic Any-shot Classification
Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD)
tasks.
Meta-training CIFAR-FS
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17
Prototypical Networks 73.24 42.91
Proto-MAML 71.80 40.16
Realistic Any-shot Classification
Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD)
tasks.
Meta-training CIFAR-FS
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
MAML 71.55 45.17
Meta-SGD 72.71 46.45
MT-net 72.30 49.17
Prototypical Networks 73.24 42.91
Proto-MAML 71.80 40.16
Bayesian TAML 75.15 51.87
Realistic Any-shot Classification
Meta-training CIFAR-FS mini-ImageNet
Meta-test
CIFAR-FS
(ID)
SVHN
(OOD)
m.-ImgNet
(ID)
CUB
(OOD)
MAML 71.55 45.17 66.64 65.77
Meta-SGD 72.71 46.45 69.95 65.94
MT-net 72.30 49.17 67.63 66.09
Prototypical Networks 73.24 42.91 69.11 60.80
Proto-MAML 71.80 40.16 68.96 61.77
Bayesian TAML 75.15 51.87 71.46 71.71
Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD)
tasks.
5.6%
Multi-Dataset Experiment
Aircraft
VGG-Flower QuickDraw
Fashion-MNIST
Traffic Signs
Meta-training with Imbalance Meta-testing with Imbalance
Aircraft
VGG-Flower QuickDraw
Meta-training Aircraft, QuickDraw, VGG-Flower
Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Bayesian TAML also outperforms the baselines in this challenging heterogeneous task
distribution.
Multi-Dataset Experiment
Meta-training Aircraft, QuickDraw, VGG-Flower
Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Prototypical Networks 50.63 72.31 65.52 49.93 64.26
Proto-MAML 51.15 69.84 65.24 53.93 63.72
Bayesian TAML also outperforms the baselines in this challenging heterogeneous task
distribution.
Multi-Dataset Experiment
Meta-training Aircraft, QuickDraw, VGG-Flower
Meta-test
Aircraft
(ID)
QuickDraw
(ID)
VGG-Flower
(ID)
Traffic Signs
(OOD)
FMNIST
(OOD)
MAML 48.60 69.02 60.38 51.96 63.10
Meta-SGD 49.71 70.26 59.41 52.07 62.71
MT-net 51.68 68.78 64.20 56.36 62.86
Prototypical Networks 50.63 72.31 65.52 49.93 64.26
Proto-MAML 51.15 69.84 65.24 53.93 63.72
Bayesian TAML 54.43 72.03 67.72 64.81 68.94
Bayesian TAML also outperforms the baselines in this challenging heterogeneous task
distribution.
Multi-Dataset Experiment
8.5%
𝒛 𝜏 for Distributional Shift
Meta-training CIFAR-FS miniImageNet
Meta-test SVHN CUB
MAML 45.17 65.77
Meta-SGD 46.45 65.94
Bayesian z-TAML 52.29 69.11
Large task
Classification Performance (%)
TSNE visualization of 𝔼[𝒛 𝜏
]
Initial
parameter
𝒛-TAML: Meta-SGD + 𝒛 𝜏
𝝎 𝝉 for Class Imbalance
CIFAR-FS
Degree of class imbalance
None Medium High
MAML 73.60 71.15 67.43
Meta-SGD 73.25 72.68 71.61
Bayesian 𝝎-TAML 73.44 73.20 72.86
Classification Performance (%)
𝝎-TAML: Meta-SGD + 𝝎 𝜏
𝜸 𝜏 for Task Imbalance
Task size vs. Acc. Task size vs. 𝔼[𝜸 𝝉
]
𝜸-TAML: Meta-SGD + 𝜸 𝜏
Effectiveness of Bayesian Methods
We further found out that Bayesian framework is very effective for solving out-of-
distribution tasks.
• MAML + Bayesian → Ensemble, which seems effective for OOD tasks.
• Also, Bayesian framework amplifies the effect of the balancing variables.
Meta-training CIFAR-FS CIFAR-FS
Meta-test CIFAR-FS SVHN
MAML 70.19 41.81
Meta-SGD 72.71 46.45
Deterministic TAML 73.82 46.78
Bayesian TAML 75.15 51.87
Classification Performance (%)
Task size vs. 𝜸 𝜏
+1.3 +5.1
Effectiveness of Hierarchical Statistics Pooling
Finally, we evaluate the effectiveness of the hierarchical dataset encoding.
The result suggests that set cardinality and variance we utilized for hierarchical
encoding are more informative than simple mean-pooling methods.
CIFAR-FS
Hierarachical encoding
× √
Mean 73.84 73.69
Mean + N 73.17 74.88
Mean + Var. + N 73.93 75.15
Summary
• Existing work on meta-learning consider artificial settings where we assume the
same number of instances per task and class. However, in realistic scenarios, we
need to handle task/class imbalances and distributional shift.
• To this end, we propose to learn to balance the effect of task-specific learning by
introducing the three balancing variables.
• Bayesian framework seems very important for solving OOD tasks, and also amplifies
the effect of the balancing variables.
• The hierarchical set encoding effectively captures both the class-level and task-level
imbalances, as well as distributional shifts.

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

  • 1.
    Learning to Balance:Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks Hae Beom Lee¹*, Hayeon Lee¹*, Donghyun Na²*, Saehoon Kim³, Minseop Park³, Eunho Yang¹³, Sung Ju Hwang¹³ KAIST¹, TmaxData², AITRICS³
  • 2.
    Few-shot Learning Humans cangeneralize even with a single observation of a class. [Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011 Observation Query examples Human
  • 3.
    Few-shot Learning On theother hand, deep neural networks require large number of training instances to generalize well, and overfits with few training instances. Few-shot learning Observation Deep Neural Networks How can we learn a model that generalize well even with few training instances? Human Query examples [Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
  • 4.
    Meta-Learning for few-shotclassification Humans generalize well because we never learn from scratch. → Learn a model that can generalize over a task distribution! Few-shot Classification Knowledge Transfer ! Meta-training Meta-test Test Test Training Test Training Training : meta-knowledge [Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017
  • 5.
    Model-Agnostic Meta-Learning Model AgnosticMeta Learning (MAML) aims to find initial model parameter that can rapidly adapt to any tasks only with a few gradient steps. [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017 Task-specific parameter Task-specific parameter Task-specific parameter Initial model parameter 𝐷1 𝐷2 𝐷3
  • 6.
    Model-Agnostic Meta-Learning Model AgnosticMeta Learning (MAML) aims to find initial model parameter that can rapidly adapt to any tasks only with a few gradient steps. [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017 Initial model parameter Task-specific parameter Task-specific parameter Task-specific parameter Task-specific parameter for a novel task 𝐷1 𝐷2 𝐷3 𝐷∗
  • 7.
    … Artificial Settings 1. Class-imbalance3. Distributional shift Mismatch ! Realistic settings # instance / class Challenge: Realistic Task Distribution While existing works on meta-learning assume balanced task distributions, in realistic settings, we need to account for data imbalances as well as distributional shift. # classes SVHN CIFAR 2. Task-imbalance
  • 8.
    Learning to BalanceClass imbalance Tiger … Head class Imbalanced gradient direction Lion Tail class Meta-Knowledge (initial model parameter) Target learning process
  • 9.
    Learning to BalanceClass imbalance Tiger … Head class Balanced gradient direction Lion Tail class Meta-Knowledge (initial model parameter) Target learning process Class-specific gradient scaling
  • 10.
    Train Test ? ? Learningto Balance Task imbalance Small task Large task Meta-Knowledge (initial parameter) … tiger lion … Train Test ? ? tiger lion : resort to the meta-knowledge. : utilize the task information Target learning process Small task Large task Task-dependent learning rate multiplier (for each layer)
  • 11.
    Learning to BalanceDistributional Shift Car Truck Train Test ? ? Train Test ? ?Tiger Lion In-distribution task Out-of-distribution task Meta-Knowledge (Initial model parameter) Initial parameter Modulation (for each channel) Target learning process (vehicles..) (animals..) Weights : Biases :
  • 12.
    Learning to Balance Meta-Knowledge (Initialmodel parameter) Target learning process 𝜽∗ = 𝜽 ∗ 𝒛 𝜏 − 𝜸 𝜏 ∘ 𝜶 ∘ ෍ 𝑐=1 𝐶 𝜔𝑐 𝜏 ∇ 𝜃ℒ 𝑐 tr In-Dist. Out-of-Dist.
  • 13.
    Learning to Balance Targetlearning process Head class Tail class 𝜽∗ = 𝜽 ∗ 𝒛 𝜏 − 𝜸 𝜏 ∘ 𝜶 ∘ ෍ 𝑐=1 𝐶 𝜔𝑐 𝜏 ∇ 𝜃ℒ 𝑐 tr Meta-Knowledge (Initial model parameter) In-Dist. Out-of-Dist.
  • 14.
    Learning to Balance Smalltask Large task Target learning process Head class Tail class 𝜽∗ = 𝜽 ∗ 𝒛 𝜏 − 𝜸 𝜏 ∘ 𝜶 ∘ ෍ 𝑐=1 𝐶 𝜔𝑐 𝜏 ∇ 𝜃ℒ 𝑐 tr Meta-Knowledge (Initial model parameter) In-Dist. Out-of-Dist.
  • 15.
    Bayesian TAML Generative Process Bayesianframework [1][2] • Allows robust inference on the latent variables. • In MAML → results in the ensemble of diverse task-specific predictors [3]. [1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018 [2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019 TrainTest [3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
  • 16.
    Bayesian TAML Generative Process Bayesianframework [1][2] • Allows robust inference on the latent variables. • In MAML → results in the ensemble of diverse task-specific predictors [3]. [1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018 [2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019 TrainTest [3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
  • 17.
    Bayesian TAML Generative Process Bayesianframework [1][2] • Allows robust inference on the latent variables. • In MAML → results in the ensemble of diverse task-specific predictors [3]. [1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018 [2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019 TrainTest [3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
  • 18.
    Bayesian TAML Generative Process Bayesianframework [1][2] • Allows robust inference on the latent variables. • In MAML → results in the ensemble of diverse task-specific predictors [3]. [1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018 [2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019 TrainTest [3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
  • 19.
    Bayesian TAML Generative Process Bayesianframework [1][2] • Allows robust inference on the latent variables. • In MAML → results in the ensemble of diverse task-specific predictors [3]. [1] Finn et al., Probabilistic Model-Agnostic Meta-Learning, NeurIPS 2018 [2] Gordon et al., Meta-Learning Probabilistic Inference For Prediction, ICLR 2019 TrainTest [3] Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017
  • 20.
    Variational Inference Inference TrainTest dependent only ontraining dataset [1] [1] Ravi and Beatson, Amortized Bayesian Meta-Learning, ICLR 2019 Generative Process TrainTest Variational distribution We cannot access to the test label at meta-testing time. → Variational distribution should not have dependency on the test set.
  • 21.
    Meta-training and Meta-testing Finalmeta-training objective: Meta-testing with Monte-Carlo (MC) approximation: MC approximation (S=10) Expected log likelihood Regularization Evidence Lower Bound (ELBO)
  • 22.
    Inference Network Statistics Pooling •extracts various statistics from the training dataset. … mean variance Cardinality Distributional shift Diversity of the set elements The size of the set Set How to build the inference network ? → should be able to recognize the imbalance and distributional shift in .
  • 23.
    Inference Network Global : test 3x3conv 3x3 conv 3x3 conv 3x3 conv FC Task- specific : train … … Statistics pooling mean var. cardinality Hierarchical dataset encoding • Class encoding encodes the statistics of the instances within each class. • Task encoding encodes the statistics of the classes within each task. Instance-wise statistics … … Statistics pooling mean var. cardinality Class-wise statistics
  • 24.
    Experimental Setup Meta-training withImbalance (1-50 shot) In-distribution Out-of-distribution CUB SVHNCIFAR-FS Mini-ImageNet Task Imbalance Class Imbalance Meta-testing with Imbalance & Distributional shift Small Task …Class Class … Large Task Class Class … Small Class Task Large Class … … Class Class… Task
  • 25.
    Realistic Any-shot Classification BayesianTAML outperforms the baselines, especially on out-of-distribution (OOD) tasks. Meta-training CIFAR-FS Meta-test CIFAR-FS (ID) SVHN (OOD) MAML 71.55 45.17 Meta-SGD 72.71 46.45 MT-net 72.30 49.17
  • 26.
    Realistic Any-shot Classification BayesianTAML outperforms the baselines, especially on out-of-distribution (OOD) tasks. Meta-training CIFAR-FS Meta-test CIFAR-FS (ID) SVHN (OOD) MAML 71.55 45.17 Meta-SGD 72.71 46.45 MT-net 72.30 49.17 Prototypical Networks 73.24 42.91 Proto-MAML 71.80 40.16
  • 27.
    Realistic Any-shot Classification BayesianTAML outperforms the baselines, especially on out-of-distribution (OOD) tasks. Meta-training CIFAR-FS Meta-test CIFAR-FS (ID) SVHN (OOD) MAML 71.55 45.17 Meta-SGD 72.71 46.45 MT-net 72.30 49.17 Prototypical Networks 73.24 42.91 Proto-MAML 71.80 40.16 Bayesian TAML 75.15 51.87
  • 28.
    Realistic Any-shot Classification Meta-trainingCIFAR-FS mini-ImageNet Meta-test CIFAR-FS (ID) SVHN (OOD) m.-ImgNet (ID) CUB (OOD) MAML 71.55 45.17 66.64 65.77 Meta-SGD 72.71 46.45 69.95 65.94 MT-net 72.30 49.17 67.63 66.09 Prototypical Networks 73.24 42.91 69.11 60.80 Proto-MAML 71.80 40.16 68.96 61.77 Bayesian TAML 75.15 51.87 71.46 71.71 Bayesian TAML outperforms the baselines, especially on out-of-distribution (OOD) tasks. 5.6%
  • 29.
    Multi-Dataset Experiment Aircraft VGG-Flower QuickDraw Fashion-MNIST TrafficSigns Meta-training with Imbalance Meta-testing with Imbalance Aircraft VGG-Flower QuickDraw
  • 30.
    Meta-training Aircraft, QuickDraw,VGG-Flower Meta-test Aircraft (ID) QuickDraw (ID) VGG-Flower (ID) Traffic Signs (OOD) FMNIST (OOD) MAML 48.60 69.02 60.38 51.96 63.10 Meta-SGD 49.71 70.26 59.41 52.07 62.71 MT-net 51.68 68.78 64.20 56.36 62.86 Bayesian TAML also outperforms the baselines in this challenging heterogeneous task distribution. Multi-Dataset Experiment
  • 31.
    Meta-training Aircraft, QuickDraw,VGG-Flower Meta-test Aircraft (ID) QuickDraw (ID) VGG-Flower (ID) Traffic Signs (OOD) FMNIST (OOD) MAML 48.60 69.02 60.38 51.96 63.10 Meta-SGD 49.71 70.26 59.41 52.07 62.71 MT-net 51.68 68.78 64.20 56.36 62.86 Prototypical Networks 50.63 72.31 65.52 49.93 64.26 Proto-MAML 51.15 69.84 65.24 53.93 63.72 Bayesian TAML also outperforms the baselines in this challenging heterogeneous task distribution. Multi-Dataset Experiment
  • 32.
    Meta-training Aircraft, QuickDraw,VGG-Flower Meta-test Aircraft (ID) QuickDraw (ID) VGG-Flower (ID) Traffic Signs (OOD) FMNIST (OOD) MAML 48.60 69.02 60.38 51.96 63.10 Meta-SGD 49.71 70.26 59.41 52.07 62.71 MT-net 51.68 68.78 64.20 56.36 62.86 Prototypical Networks 50.63 72.31 65.52 49.93 64.26 Proto-MAML 51.15 69.84 65.24 53.93 63.72 Bayesian TAML 54.43 72.03 67.72 64.81 68.94 Bayesian TAML also outperforms the baselines in this challenging heterogeneous task distribution. Multi-Dataset Experiment 8.5%
  • 33.
    𝒛 𝜏 forDistributional Shift Meta-training CIFAR-FS miniImageNet Meta-test SVHN CUB MAML 45.17 65.77 Meta-SGD 46.45 65.94 Bayesian z-TAML 52.29 69.11 Large task Classification Performance (%) TSNE visualization of 𝔼[𝒛 𝜏 ] Initial parameter 𝒛-TAML: Meta-SGD + 𝒛 𝜏
  • 34.
    𝝎 𝝉 forClass Imbalance CIFAR-FS Degree of class imbalance None Medium High MAML 73.60 71.15 67.43 Meta-SGD 73.25 72.68 71.61 Bayesian 𝝎-TAML 73.44 73.20 72.86 Classification Performance (%) 𝝎-TAML: Meta-SGD + 𝝎 𝜏
  • 35.
    𝜸 𝜏 forTask Imbalance Task size vs. Acc. Task size vs. 𝔼[𝜸 𝝉 ] 𝜸-TAML: Meta-SGD + 𝜸 𝜏
  • 36.
    Effectiveness of BayesianMethods We further found out that Bayesian framework is very effective for solving out-of- distribution tasks. • MAML + Bayesian → Ensemble, which seems effective for OOD tasks. • Also, Bayesian framework amplifies the effect of the balancing variables. Meta-training CIFAR-FS CIFAR-FS Meta-test CIFAR-FS SVHN MAML 70.19 41.81 Meta-SGD 72.71 46.45 Deterministic TAML 73.82 46.78 Bayesian TAML 75.15 51.87 Classification Performance (%) Task size vs. 𝜸 𝜏 +1.3 +5.1
  • 37.
    Effectiveness of HierarchicalStatistics Pooling Finally, we evaluate the effectiveness of the hierarchical dataset encoding. The result suggests that set cardinality and variance we utilized for hierarchical encoding are more informative than simple mean-pooling methods. CIFAR-FS Hierarachical encoding × √ Mean 73.84 73.69 Mean + N 73.17 74.88 Mean + Var. + N 73.93 75.15
  • 38.
    Summary • Existing workon meta-learning consider artificial settings where we assume the same number of instances per task and class. However, in realistic scenarios, we need to handle task/class imbalances and distributional shift. • To this end, we propose to learn to balance the effect of task-specific learning by introducing the three balancing variables. • Bayesian framework seems very important for solving OOD tasks, and also amplifies the effect of the balancing variables. • The hierarchical set encoding effectively captures both the class-level and task-level imbalances, as well as distributional shifts.