Meta Dropout: Learning to Perturb Latent Features for Generalization

Meta Dropout:
Learning to Perturb Features for Generalization
Hae Beom Lee¹, Taewook Nam¹, Eunho Yang¹², Sung Ju Hwang¹²
KAIST¹, AITRICS²

Few-shot Learning
Humans can generalize even with a single observation of a class.
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
Observation
Query examples
Human

Few-shot Learning
On the other hand, deep neural networks require large number of training instances
to generalize well, and overfits with few training instances.
Few-shot
learning
Observation
Deep Neural Networks
How can we learn a model that generalize well even with few training instances?
Human
Query examples

Lack of data results in poor estimation of the decision boundary.
Learning to Perturb Latent Features
train
train
train

train
train
train
test
test
test

train
train
train
test
test
test
What if we learn to perturb latent features in order to explain the test examples?
𝑝 𝜙(𝒛|𝒙)

train
train
train
test
test
test
How to learn
𝝓 ?
What if we learn to perturb latent features in order to explain the test examples?
→ But test examples are not observable in standard learning framework.

Meta-Learning for few-shot classification
Meta Learning: Learn a model that can generalize over a task distribution!
Few-shot Classification
Knowledge
Transfer !
Meta-training
Meta-test
Test
Test
Training Test
Training
Training
: meta-knowledge
[Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017

Model-Agnostic Meta-Learning (MAML)
Model Agnostic Meta Learning (MAML) aims to find a good initial model
parameter that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Initial model
parameter
∇ℒ1
∇ℒ2
∇ℒ3

[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Initial model
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific parameter
for a novel task
∇ℒ∗
∇ℒ1
∇ℒ2
∇ℒ3
Model Agnostic Meta Learning (MAML) aims to find a good initial model
parameter that can rapidly adapt to any tasks only with a few gradient steps.

train
train
train
𝜃MAML
∗
𝜃
Initial model
parameter
𝜃MAML
∗
∇ 𝜃ℒtrain
Except for sharing the initial model parameter 𝜽, MAML inner-gradient
∇ 𝜃ℒtrain
does not involve any further knowledge than 𝐷train
,
which may result in suboptimal decision boundaries at the end of task adaptation.

MAML + Meta Dropout
Initial model
parameter
train
train
train
𝜃MetaDrop
∗
𝜃MAML
∗
𝜃MAML
∗
𝜃
𝜃MetaDrop
∗
∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain
∇ 𝜃ℒtrain
In Meta-dropout, we introduce the input-dependent noise distribution, and
computes the gradients of the expected loss over the noise distribution.
It improves the final model decision boundary at the end of the task adaptation.

Model Architecture
Initial model
parameter
𝜃MAML
∗
𝜃
𝜃MetaDrop
∗
∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain
∇ 𝜃ℒtrain
Multiplicative
noise
Conv
FC
Noise 𝒛
Main model
: 4-conv net
Noise 𝒛
𝜙
We let each bottom layer generate the noise for the upper layer, and the form of
the multiplicative noise is a softplus transformation of a Gaussian distribution.

Learning Objective
Meta-learning → Maximize the performance of the test examples.
Test log-likelihood
No perturb
Inner-gradient step

Train 1
Train 2
Train 1
Train 2
Test
Test
Generalization Performance
miniImageNet
5-way MAML Meta-dropout
1-shot
5-shot
49.58%
64.55%
51.93%
67.42%
Train 1
Train 2
Train 1
Train 2
Test
Test

Visualization of Stochastic Features
Original Images Stochastic
Channel 1
Stochastic
Channel 2

Comparison against Existing Regularizers
Models
Omniglot miniImageNet
1shot 5shot 1shot 5shot
No Perturbation 95.23 98.38 49.58 64.55
Manifold Mixup 89.78 97.86 48.62 63.86
Variational Information Bottleneck 94.98 98.85 48.12 64.78
Information Dropout 94.49 98.65 50.36 65.91
Meta-dropout 96.63 99.04 51.93 67.42
Meta-dropout outperforms the existing regularizers such as manifold mixup
and information-theoretic regularizers.

Adversarial Robustness
Meta-dropout Improves both clean and adversarial accuracies.
Omniglot 20-way 1-shot
𝐿∞-norm attack
adversarial
Meta-dropout
MAML
clean

Adversarial Robustness
The defense of Meta-dropout also generalizes across different attacks.
𝐿∞-norm attack 𝐿1-norm attack 𝐿2-norm attack
Omniglot 20-way 1-shot

Summary
• In this work, we showed that we can learn to perturb latent features in input-
dependent manner, in order to improve generalization.
• Meta-learning framework enables the effective learning of the perturbation
function.
• Meta-dropout outperforms the existing regularizers in meta-learning settings.
• Meta-dropout Improves both clean and adversarial accuracies on various
types of attacks.

Meta Dropout: Learning to Perturb Latent Features for Generalization

More Related Content

What's hot

Similar to Meta Dropout: Learning to Perturb Latent Features for Generalization

More from MLAI2

Recently uploaded

Meta Dropout: Learning to Perturb Latent Features for Generalization