Chang-Hoon Jeong
Biointelligence lab., Artificial Intelligence Institute
Seoul National University
Deep Meta Learning
Learning-to-Learn in Neural Networks
2
• Meta Learning: Overview
• Meta Supervised Learning
○Model-based methods
○Optimization-based Inference
○Non-parametric approach
• Meta Reinforcement Learning
o Recurrent policies methods
o Optimization-based Inference
oPOMDPs approach
Agenda
Meta Learning: Overview
4
• Large, diverse dataset, large computation → Broad Generalization
Deep Learning: SOTA(State-Of-The-Art)
GPT-3
Agent57
Meta Pseudo Labels
Pham, Hieu, et al. "Meta pseudo labels." 2020.
Brown, Tom B., et al. "Language models are few-shot learners." 2020.
Badia, Adrià Puigdomènech, et al. "Agent57: Outperforming the atari human benchmark." 2020.
5
Difference between Machine and Human
Machine is a specialist Human is a generalist
6
Problem Statement
What if your data has a long tail?
What if you want a general-purpose AI system in the real world?
CS330: Deep Multi-Task and Meta-Learning
7
Few-Shot Learning
Can you classify the image using just 3 datapoints?
By Gogh or Cezanne?
8
Few-Shot Learning
Can you classify the image using just 3 datapoints?
By Gogh or Cezanne?
(episodic) N-way K-shot learning
In this case, 2-way 3-shot learning
9
Few-Shot Learning
Can you classify the image using just 3 datapoints?
By Gogh or Cezanne?
(episodic) N-way K-shot learning
In this case, 2-way 3-shot learning
How did you accomplish this?
By leveraging prior experience!
10
• Difference between multi-task learning and meta learning
What is the Meta Learning?
11
• Meta Learning in Computer Science
oMeta-learning is a subfield of machine learning where automatic learning
algorithms are applied on metadata about machine learning experiments:
learning-to-learn
• How can we understand the meta-learning?
○Mechanistic view
- For implementation the algorithm
- Training the neural network uses a meta-dataset, which itself consists of many datasets
(each for a task)
○Probabilistic view
- For making it easier to understand the algorithm
- Extract prior information, and learning a new task uses this prior and small training set to
infer most likely posterior parameters
What is the Meta Learning?
12
• (Conventional) Machine Learning
o(Supervised Learning) Given a training dataset ,
we can train a model parameterized by , by solving,
- is a loss function that measures the match between true labels and
those predicted by
- is the condition to make explicit the dependence of this solution on
factors such as choice of optimizer(e.g. Adam, SGD, …) or function class
for (e.g. CNN, RNN, …), or initialization parameters(e.g. Xavier, He, …)
Formalizing the Meta Learning
* is pre-specified, can we learn the learning algorithm itself,
rather than assuming it is pre-specified and fixed?
13
• What is a task?
oA task:
• Different tasks can vary based on:
• different objects
• different people
• different objectives
• different lighting conditions
• different languages
• …
Formalizing the Meta Learning
CS330: Deep Multi-Task and Meta-Learning, Tutorial #2: few-shot learning and meta-learning(by W. Zi)
14
• The bad news
oDifferent tasks need to share some structure
o If this doesn’t hold, you are better off using single-task learning
• The good news
o There are many tasks with shared structure!
The Structure for Meta Learning
CS330: Deep Multi-Task and Meta-Learning
transfer! ?
15
• Meta Learning: Task distribution view
o specifies ‘how-to-learn’ and is often evaluated in terms of performance
over a distribution of tasks (meta-knowledge, inductive bias)
oLearning ‘how-to-learn’ solving by,
How to solve Meta Learning?
How can we solve this problem in practice?
16
• The learning process: Two-level optimization
○Inner-level;
- A new task is presented
- An agent tries to quickly learn the associated concepts from the training
observations
- This quick adaptation is facilitated by knowledge that it has accumulated
across earlier tasks at the outer-level
○Outer-level;
- Whereas the inner-level concerns a single task, the outer-level concerns
a multitude of tasks
- In the outer-level, the tasks of the inner-level of one cycle are integrated
How to solve Meta Learning?
17
• Meta-dataset: the dataset of datasets
Introduction to Meta-dataset
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
task
Held-out classes
18
• Meta-training (Supervised Learning)
Meta Learning: Overview
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
Query set
Support set
19
• Meta-training (Supervised Learning)
Meta Learning: Overview
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
Query set
Support set
Learning ‘how-to-learn’
(meta-learning)
20
• Meta-training (Supervised Learning)
Meta Learning: Overview
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
Query set
Support set
Learning ‘how-to-learn’
(meta-learning)
Methods
• Model-based methods
• Optimization-based Inference
• Non-parametric approach
21
• Meta-testing (Supervised Learning) → Few-Shot Learning
Meta Learning: Overview
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
Query set
Support set
Fast adaptation
with small data
22
• Meta-testing (Supervised Learning) → Few-Shot Learning
Meta Learning: Overview
Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016.
Query set
Support set
Fast adaptation
with small data
Finally, we can evaluate the accuracy of
our meta-learner by the performance of
on the test split of each target task
Meta Supervised Learning
24
Benchmark Datasets in Meta Supervised Learning
Dataset Description
Omniglot (Lake et al., 2011)
This dataset presents an image recognition task. Each image corresponds to one out of 1,623 characters from 50
different alphabets. Every character was drawn by 20 people. Note that in this case, the characters are the
classes/labels.
ImageNet (Deng et al., 2009)
This is the largest image classification dataset, containing more than 20K classes and over 14 million colored images.
MiniImageNet is a mini variant of the large ImageNet dataset (Deng et al., 2009) for image classification, proposed by
Vinyals et al. (2016) to reduce the engineering efforts to run experiments. The mini dataset contains 60,000 colored
images of size 84 × 84. There are a total of 100 classes present, each accorded by 600 examples. TieredImageNet
(Ren et al., 2018) is another variation of the large ImageNet dataset. It is similar to miniImageNet, but contains a
hierarchical structure. That is, there are 34 classes, each with its own sub-classes.
CIFAR-10 and CIFAR-100 (Krizhevsky, 2009)
Two other image recognition datasets. Each one contains 60K RGB images of size 32×32. CIFAR-10 and CIFAR100
contain 10 and 100 classes respectively, with a uniform number of examples per class (6,000 and 600 respectively).
Every class in CIFAR-100 also has a super-class, of which there are 20 in the full data set. Many variants of the
CIFAR datasets can be sampled, giving rise to e.g. CIFAR-FS (Bertinetto et al., 2019) and FC-100 (Oreshkin et al.,
2018).
CUB-200-2011 (Wah et al., 2011)
The CUB-200-2011 dataset contains roughly 12K RGB images of birds from 200 species. Every image has some
labeled attributes(e.g. crown color, tail shape).
MNIST (LeCun et al., 2010)
MNIST presents a hand-written digit recognition task, containing ten classes (for digits 0 through 9). In total, the
dataset is split into a 60K train and 10K test gray scale images of hand-written digits.
Meta-Dataset (Triantafillou et al., 2020)
This dataset comprises several other datasets such as Omniglot (Lake et al., 2011), CUB-200 (Wah et al., 2011),
ImageNet (Deng et al., 2009), and more (Triantafillou et al., 2020). An episode is then constructed by sampling a
dataset (e.g. Omniglot), selecting a subset of labels to create train and test splits as before. In this way, broader
generalization is enforced since the tasks are more distant from each other.
Meta-world (Yu et al., 2019)
A meta reinforcement learning dataset, containing 50 robotic manipulation tasks (control a robot arm to achieve some
pre-defined goal, e.g. unlocking a door, or playing soccer). It was specifically designed to cover a broad range of
tasks, such that meaningful generalization can be measures (Yu et al., 2019).
25
• Model-based methods (SL O, RL O)
○ Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on
machine learning. 2016.
○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017).
○ Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of machine learning research 70
(2017): 2554.
• Optimization-based Inference (SL O, RL O)
○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep
networks." arXiv preprint arXiv:1703.03400 (2017).
○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint
arXiv:1803.02999 (2018).
○ Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint
arXiv:1807.05960 (2018).
• Non-parametric approach (SL O, RL X)
○ Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot
image recognition." ICML deep learning workshop. Vol. 2. 2015.
○ Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information
processing systems. 2016.
○ Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-shot
learning." Advances in neural information processing systems. 2017.
Meta Learning Algorithms
26
• Model-based methods (SL O, RL O)
○ Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on
machine learning. 2016.
○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017).
○ Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of machine learning research 70
(2017): 2554.
• Optimization-based Inference (SL O, RL O)
○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep
networks." arXiv preprint arXiv:1703.03400 (2017).
○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint
arXiv:1803.02999 (2018).
○ Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint
arXiv:1807.05960 (2018).
• Non-parametric approach (SL O, RL X)
○ Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot
image recognition." ICML deep learning workshop. Vol. 2. 2015.
○ Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information
processing systems. 2016.
○ Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-shot
learning." Advances in neural information processing systems. 2017.
Meta Learning Algorithms
27
• General approach: Train a recurrent networks to represent
Model-based methods
CS330: Deep Multi-Task and Meta-Learning
Train with standard supervised learning
support set query set
28
• General approach: Train a recurrent networks to represent
Model-based methods
CS330: Deep Multi-Task and Meta-Learning
1. Sample task (or mini-batch of tasks)
2. Sample disjoint datasets from
29
• General approach: Train a recurrent networks to represent
Model-based methods
CS330: Deep Multi-Task and Meta-Learning
1. Sample task (or mini-batch of tasks)
2. Sample disjoint datasets from
3. Compute
4. Update using
support set query set
30
• One of Challenge: Output parameters does not scalable
oNo need to output all parameters of neural net, only sufficient statistics
(Santoro et al., MANN, 2016), (Mishra et al., SNAIL, 2017)
Model-based methods
For example, in multi-task learning,
concatenate or
multiplication
Low-dimensional vector
Represents contextual task information
CS330: Deep Multi-Task and Meta-Learning
31
• Memory-Augmented Neural Networks
Model-based methods: Variants
Graves, Alex, et al. "Neural turing machines." arXiv, 2014,
Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." ICML 2016.
32
• A Simple Neural AttentIve Meta-Learner(SNAIL)
Model-based methods: Variants
Mishra, Nikhil, et al. "A simple neural attentive meta-learner." ICLR 2018
33
• Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv p
reprint arXiv:1410.5401 (2014).
• Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv
preprint arXiv:1410.3916 (2014).
• Santoro, Adam, et al. "Meta-learning with memory-augmented neural
networks." International conference on machine learning. 2016.
• Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint
arXiv:1707.03141 (2017).
• Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of
machine learning research 70 (2017): 2554.
• Garnelo, Marta, et al. "Conditional neural processes." arXiv preprint
arXiv:1807.01613 (2018).
Model-based methods: Further Readings
34
• Can we utilize the pre-trained prior knowledge?
• One successful form of prior knowledge: initialization for fine-tuning
• What about Transfer Learning?
Optimization-based Inference
Can we do better?
35
• Model-Agnostic Meta Learning
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
36
• Model-Agnostic Meta Learning
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
37
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
38
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
meta-initialization(outer-loop)
adaptation(inner-loop)
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
39
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
40
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
41
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
1st for loop
2nd for loop
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
42
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
1st for loop
2nd for loop
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
43
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
1st for loop
2nd for loop
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
44
• MAML for Supervised Learning(Meta training)
Model-Agnostic Meta Learning
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
45
• MAML for Supervised Learning(Meta testing)
Model-Agnostic Meta Learning
Few-shot adaptation
Fine Tuning(Adaptation)
Just 1 or few gradient steps
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
46
• MAML for Supervised Learning(Meta testing)
Model-Agnostic Meta Learning
Few-shot adaptation
Fine Tuning(Adaptation)
Just 1 or few gradient steps
Evaluation: N-way k-shot Accuracy
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
47
• Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast
adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017).
• Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." (2016).
• Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning
algorithms." arXiv preprint arXiv:1803.02999 (2018).
• Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint
arXiv:1807.05960 (2018).
• Antoniou, Antreas, Harrison Edwards, and Amos Storkey. "How to train your MAML." arXiv
preprint arXiv:1810.09502 (2018).
• Lee, Yoonho, and Seungjin Choi. "Gradient-based meta-learning with learned layerwise metric
and subspace." arXiv preprint arXiv:1801.05558 (2018).
• Fallah, Alireza, Aryan Mokhtari, and Asuman Ozdaglar. "On the convergence theory of gradient-
based model-agnostic meta-learning algorithms." International Conference on Artificial
Intelligence and Statistics. 2020.
• Song, Xingyou, et al. "Es-maml: Simple hessian-free meta learning." arXiv preprint
arXiv:1910.01215 (2019).
• Triantafillou, Eleni, et al. "Meta-dataset: A dataset of datasets for learning to learn from few
examples." arXiv preprint arXiv:1903.03096 (2019).
Optimization-based Inference: Further Readings
48
• KNN(K-Nearest Neighbor) Algorithm
oLazy, Non-parametric Learner
Non-parametric approach
Red or Blue?
49
• KNN(K-Nearest Neighbor) Algorithm
oLazy, Non-parametric Learner
Non-parametric approach
Red or Blue?
1-Nearest Neighbor
→ Blue
3-Nearset Neighbor
→ Red
5-Nearset Neighbor
→ Red
50
• In low data regimes (e.g. 2D data), non-parametric methods are
simple, works well!
• What about high dimensional data?
o During meta-training
→ Still want to be parametric
o During meta-testing
→ Few-Shot learning ↔ low data regime
Non-parametric approach
Can we train parametric meta-learners that produce effective
non-parametric learners?
Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
51
• At meta-test time, compare test image(query) with training images
(support set)
o Which space do we use?
o Which distance metric do we use?
Non-parametric approach
Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
52
• Siamese Neural Networks
Non-parametric approach: Variants
meta-training
meta-testing
Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015.
53
• Siamese Neural Networks: A simple example
Non-parametric approach: Variants
54
• Siamese Neural Networks: A simple example
Non-parametric approach: Variants
Label
1
55
• Siamese Neural Networks: A simple example
Non-parametric approach: Variants
Label
0
56
• Siamese Neural Networks
Non-parametric approach: Variants
Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015.
Same class:
Otherwise:
Siamese twin is not depicted, but joins immediately after the 4096 unit
fully-connected layer where the L1 component-wise distance between
vectors is computed
57
• Siamese Neural Networks
Non-parametric approach: Variants
Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015.
Test Accuracy on Omniglot Dataset
58
• Siamese Neural Networks
Non-parametric approach: Variants
Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015.
Test Accuracy on Omniglot Dataset
meta-training: 2-way classification
meta-testing: N-way classification
“A simple machine learning principle:
test and train conditions must match”
(Vinyals et al., Matching Networks for One-Shot Learning)
59
• Matching Networks for One-Shot Learning
Non-parametric approach: Variants
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems. 2016.
bidirectional
LSTM
Convolutional
encoder
Cosine distance
60
• Prototypical Networks
oCan we aggregate class information to create a prototypical embedding?
Non-parametric approach: Variants
: Euclidian distance
Snell, Jake, et al., "Prototypical networks for few-shot learning." Advances in neural information processing systems. 2017.
61
• Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural
networks for one-shot image recognition." ICML deep learning workshop. Vol. 2.
2015.
• Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in
neural information processing systems. 2016.
• Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-
shot learning." Advances in neural information processing systems. 2017.
• Sung, Flood, et al. "Learning to compare: Relation network for few-shot
learning." Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2018.
• Allen, Kelsey R., et al. "Infinite mixture prototypes for few-shot learning." arXiv
preprint arXiv:1902.04552 (2019).
• Garcia, Victor, and Joan Bruna. "Few-shot learning with graph neural
networks." arXiv preprint arXiv:1711.04043 (2017).
Non-parametric approach: Further Readings
Meta Reinforcement Learning
63
• Reinforcement Learning
Introduction to Reinforcement Learning
Observation
𝑆𝑡
Action
𝐴𝑡
Reward
𝑅𝑡
Agent
The World
Environment
• At each time step 𝑡 the Agent
- Executes action 𝐴𝑡
- Receives observation 𝑆𝑡
- Receives scalar reward 𝑅𝑡
• The environment
- Receives action 𝐴𝑡
- Emits observation 𝑆𝑡+1
- Emits scalar reward 𝑅𝑡+1
• 𝑡 increments at env.step()
64
• Reinforcement Learning
Introduction to Reinforcement Learning
Environment
Agent
65
• Reinforcement Learning
Introduction to Reinforcement Learning
Environment
Agent
State 𝑆𝑡
66
• Reinforcement Learning
Introduction to Reinforcement Learning
Environment
Agent
State 𝑆𝑡
Action 𝐴𝑡
67
• Reinforcement Learning
Introduction to Reinforcement Learning
Environment
State 𝑆𝑡
Action 𝐴𝑡
Reward 𝑅𝑡+1
Next State 𝑆𝑡+1
Agent
68
• Markov Decision Processes (MDPs)
○Mathematical formulation of the RL problem
• MDPs defined by
○ : State set
○ : Action set
○ : Transition set
○ : Reward
○ : Discount factor
Introduction to Reinforcement Learning
69
• Markov Decision Processes (MDPs)
○Mathematical formulation of the RL problem
• MDPs defined by
○ : State set
○ : Action set
○ : Transition set
○ : Reward
○ : Discount factor
• (Stochastic) Policies
○Probability of agent’s action:
Introduction to Reinforcement Learning
Goal: Find Optimal policy
Value function
70
• What’s wrong with Reinforcement Learning?
Problem Statement
DQN
OpenAI
Five
AlphaStar
AlphaGo
71
• What’s wrong with Reinforcement Learning?
Problem Statement
MDP 1 MDP 2 MDP 3
Can we meta-learn reinforcement learning algorithms that are much more efficient?
“Hula Beach”, “Never grow up”, “The Sled” – by artist Matt Spangler, mattspangler.com
72
• What’s wrong with Reinforcement Learning?
Problem Statement
https://bit.ly/bandit-meta-rl-ex
Reinforcement
Learning
Meta
Reinforcement
Learning
73
• General Framework for Meta Reinforcement Learning
Meta Reinforcement Learning
Env 1(MDP 1)
Env 2(MDP 2)
Env 3(MDP 3)
.
.
.
meta-training(10 env)
Meta RL
Algorithm
Env 11(MDP
11)
meta-testing
(with small interactions)
Fast meta-learned
Agent
74
• Recurrent policies methods
○ Duan, Yan, et al. "RL2
: Fast reinforcement learning via slow reinforcement learning." arXiv preprint
arXiv:1611.02779 (2016).
○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017).
• Optimization-based Inference
○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of
deep networks." arXiv preprint arXiv:1703.03400 (2017).
○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv
preprint arXiv:1803.02999 (2018).
• POMDPs approach
○ Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context
variables." International conference on machine learning. 2019.
○ Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv preprint
arXiv:1905.06424 (2019).
Meta Reinforcement Learning Algorithms
75
• Recurrent policies methods
○ Duan, Yan, et al. "RL2
: Fast reinforcement learning via slow reinforcement learning." arXiv preprint
arXiv:1611.02779 (2016).
○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017).
• Optimization-based Inference
○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of
deep networks." arXiv preprint arXiv:1703.03400 (2017).
○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv
preprint arXiv:1803.02999 (2018).
• POMDPs approach
○ Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context
variables." International conference on machine learning. 2019.
○ Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv preprint
arXiv:1905.06424 (2019).
Meta Reinforcement Learning Algorithms
76
• General approach: Just train a recurrent network!
Recurrent policies methods
RNN hidden state is not reset between episodes!
Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
77
• RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Recurrent policies methods: Variants
Duan, Yan, et al. "RL2
: Fast reinforcement learning via slow reinforcement learning." ICLR 2016.
(Vanilla) Reinforcement Learning: Optimize episode by episode
RL
Algorithm:
TRPO+GAE
78
• RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Recurrent policies methods: Variants
Duan, Yan, et al. "RL2
: Fast reinforcement learning via slow reinforcement learning." ICLR 2016.
Meta Reinforcement Learning: Optimize trial by trial
preserved initialization
: Sample environment
: 𝑘’th episode in environment
RL
Algorithm:
TRPO+GAE
79
• A Simple Neural AttentIve Meta-Learner(SNAIL)
Recurrent policies methods: Variants
Mishra, Nikhil, et al. "A simple neural attentive meta-learner." ICLR 2018 & https://www.programmersought.com/article/1715257628
/
Like RL2 but:
replace the LSTM with dilated temporal convolution
(like wavenet) + attention
dilated temporal convolution
80
• Botvinick, Matthew, et al. "Reinforcement learning, fast and slow." Trends i
n cognitive sciences 23.5 (2019): 408-422.
• Wang, Jane X., et al. "Learning to reinforcement learn." arXiv preprint arXi
v:1611.05763 (2016).
• Duan, Yan, et al. "RL2: Fast reinforcement learning via slow reinforcement
learning." arXiv preprint arXiv:1611.02779 (2016).
• Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv
preprint arXiv:1707.03141 (2017).
• Ritter, Samuel, et al. "Been there, done that: Meta-learning with episodic
recall." arXiv preprint arXiv:1805.09692 (2018).
Recurrent policies methods: Further Readings
81
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
82
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
83
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
meta-initialization(outer-loop)
adaptation(inner-loop)
84
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
85
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
…
86
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
Roll out
87
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
88
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
Roll out
89
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
90
• Model-Agnostic Meta Learning
oAlso can be applied to Reinforcement Learning!
Optimization-based Inference
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
Few-shot adaptation
Fine Tuning(Adaptation)
Just 1 or few gradient steps
Evaluation: Cumulative Rewards
91
• MAML-RL Demo(meta-test)
Optimization-based Inference
https://sites.google.com/view/maml
92
• MAML-RL Demo(meta-test)
Optimization-based Inference
https://sites.google.com/view/maml
93
• Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning
for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017).
• Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning
algorithms." arXiv preprint arXiv:1803.02999 (2018).
• Xu, Zhongwen, Hado P. van Hasselt, and David Silver. "Meta-gradient
reinforcement learning." Advances in neural information processing systems.
2018.
• Houthooft, Rein, et al. "Evolved policy gradients." Advances in Neural
Information Processing Systems. 2018.
• Salimans, Tim, et al. "Evolution strategies as a scalable alternative to
reinforcement learning." arXiv preprint arXiv:1703.03864 (2017).
• Rothfuss, Jonas, et al. "Promp: Proximal meta-policy search." arXiv preprint
arXiv:1810.06784 (2018).
• Fernando, Chrisantha, et al. "Meta-learning by the baldwin effect." Proceedings
of the Genetic and Evolutionary Computation Conference Companion. 2018.
Optimization-based Inference: Further Readings
94
• What is Partially Observable Markov Decision Processes(POMDPs)?
• A POMDP is an MDP with hidden states(hidden Markov model with
actions)
• A POMDP is a tuple
○ is a finite set of states
○ is a finite set of actions
○ is a finite set of observations
○ is a state transition probability matrix:
○ is a reward function:
○ is an observation function:
○ is a discount factor:
• A History is a sequence of actions, observations, and rewards
POMDPs approach
95
• A belief state is a probability distribution over states, conditioned
on history
POMDPs approach
Environment Belief state
Example of observation
• Number of adjacent walls
• A noisy version?
- Wrong value with probability 0.1
96
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved state: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
Initial State
A simple Gridworld
(Only 𝑠3 gives positive reward) Where am I?
𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3
97
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved state: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
Initial State
Action!
A simple Gridworld
(Only 𝑠3 gives positive reward) Where am I?
𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3
𝑠1 𝑠2 𝑠3
𝑠1 𝑠2 𝑠3
𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
?
98
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved state: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
Initial State
Action!
A simple Gridworld
(Only 𝑠3 gives positive reward) Where am I?
𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3
𝑠1 𝑠2 𝑠3
𝑠1 𝑠2 𝑠3
𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
99
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved task: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
𝑠1 𝑠2 𝑠3
What task am I in?
MDP1 MDP2 MDP3
A simple Gridworld
(each task gives positive reward in different state)
MDP1
MDP2
MDP3
Initial State
100
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved task: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
𝑠1 𝑠2 𝑠3
What task am I in?
MDP1 MDP2 MDP3
A simple Gridworld
(each task gives positive reward in different state)
MDP1
MDP2
MDP3
𝑠1 𝑠2 𝑠3 MDP1 MDP2 MDP3
Initial State
Action!
𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
?
101
• Meta Reinforcement Learning as POMDPs
• POMDP for unobserved task: A simple example
POMDPs approach
Motivated by CS330: Deep Multi-Task and Meta-Learning
𝑠1 𝑠2 𝑠3
What task am I in?
MDP1 MDP2 MDP3
A simple Gridworld
(each task gives positive reward in different state)
MDP1
MDP2
MDP3
𝑠1 𝑠2 𝑠3 MDP1 MDP2 MDP3
Initial State
Action!
𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
102
• Meta Reinforcement Learning as Partially Observable RL
○Control-Inference duality problem
○ → everything needed to solve the task
• So, what does that mean?
○Learning a task = inferring
○Encapsulated information policy must solve the current task!
• General recipe
1. Sample
2. Act according to to collect more data
POMDPs approach
some approximate posterior
(e.g. variational inference)
act as though was correct
103
• Key Idea: How can we estimate Task Belief States?
1. Supervised Learning (if available)
- Can we use task labels in meta-training (not adaptation phase)?
2. Variational approach
- 𝑝(𝑧|𝑐) is intractable, can we use variational inference in meta RL?
POMDPs approach: Variants
104
• An agent consists of two modules;
○The optimal meta-learner only needs to make decisions based on current
state and the current belief
1. : the policy dependent on the current state and (an approximate
representation of) posterior
2. The belief module: learns to output an approximate representation
POMDPs approach: Auxiliary Supervised Learning
But how can we estimate the ?
Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019
105
• Some examples of task labels
○Multi-armed bandit
- A vector of arm probabilities
○Semi-circle
- An angle of semi-circle(e.g. max = 2𝜋)
○Cheetah velocity
- Target velocity
POMDPs approach: Auxiliary Supervised Learning
Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019
106
• Meta Reinforcement Learning as Task Inference
POMDPs approach: Auxiliary Supervised Learning
Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019
A. A baseline LSTM agent
B. A belief network agent
C. Auxiliary head agent
• LSTMs and information bottleneck(IB) is optional
• RL algorithms
- SVG(0)(Nicolas, Heess, et al., 2015) as off-policy
- PPO(John, Schulman, et al., 2017) as on-policy
107
• Can we achieve both meta-training and adaptation efficiency?
- Yes! we can make an off-policy meta-RL algorithm that disentangles task
inference and control
○Meta-training efficiency: Off-policy RL algorithm(e.g. SAC)
○Adaptation efficiency: Probabilistic encoder to estimate task-belief
POMDPs approach: Variational approach
Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019.
108
• Objective of variational approach
POMDPs approach: Variational approach
Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019.
Don’t need to know the order of transitions
in order to identify the MDP(Markov property)
Use a permutation-invariant encoder for
simplicity and speed(No need to use RNN)
Regularization(Information bottleneck)
Likelihood(Bellman error)
Reparameterization trick
109
• PEARL(Probabilistic Embeddings for Actor-critic meta RL)
POMDPs approach: Variational approach
Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019.
110
• Osband, Ian, Daniel Russo, and Benjamin Van Roy. "(More) efficient
reinforcement learning via posterior sampling." Advances in Neural Information
Processing Systems. 2013.
• Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via
probabilistic context variables." International conference on machine learning.
2019.
• Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv
preprint arXiv:1905.06424 (2019).
• Zintgraf, Luisa, et al. "Variational Task Embeddings for Fast Adaption in Deep
Reinforcement Learning." International Conference on Learning Representations
Workshop on Structure & Priors in Reinforcement Learning. 2019.
• Zintgraf, Luisa, et al. "VariBAD: A Very Good Method for Bayes-Adaptive Deep
RL via Meta-Learning." arXiv preprint arXiv:1910.08348 (2019).
POMDPs approach: Further Readings
Thank you for your attention!

Deep Meta Learning

  • 1.
    Chang-Hoon Jeong Biointelligence lab.,Artificial Intelligence Institute Seoul National University Deep Meta Learning Learning-to-Learn in Neural Networks
  • 2.
    2 • Meta Learning:Overview • Meta Supervised Learning ○Model-based methods ○Optimization-based Inference ○Non-parametric approach • Meta Reinforcement Learning o Recurrent policies methods o Optimization-based Inference oPOMDPs approach Agenda
  • 3.
  • 4.
    4 • Large, diversedataset, large computation → Broad Generalization Deep Learning: SOTA(State-Of-The-Art) GPT-3 Agent57 Meta Pseudo Labels Pham, Hieu, et al. "Meta pseudo labels." 2020. Brown, Tom B., et al. "Language models are few-shot learners." 2020. Badia, Adrià Puigdomènech, et al. "Agent57: Outperforming the atari human benchmark." 2020.
  • 5.
    5 Difference between Machineand Human Machine is a specialist Human is a generalist
  • 6.
    6 Problem Statement What ifyour data has a long tail? What if you want a general-purpose AI system in the real world? CS330: Deep Multi-Task and Meta-Learning
  • 7.
    7 Few-Shot Learning Can youclassify the image using just 3 datapoints? By Gogh or Cezanne?
  • 8.
    8 Few-Shot Learning Can youclassify the image using just 3 datapoints? By Gogh or Cezanne? (episodic) N-way K-shot learning In this case, 2-way 3-shot learning
  • 9.
    9 Few-Shot Learning Can youclassify the image using just 3 datapoints? By Gogh or Cezanne? (episodic) N-way K-shot learning In this case, 2-way 3-shot learning How did you accomplish this? By leveraging prior experience!
  • 10.
    10 • Difference betweenmulti-task learning and meta learning What is the Meta Learning?
  • 11.
    11 • Meta Learningin Computer Science oMeta-learning is a subfield of machine learning where automatic learning algorithms are applied on metadata about machine learning experiments: learning-to-learn • How can we understand the meta-learning? ○Mechanistic view - For implementation the algorithm - Training the neural network uses a meta-dataset, which itself consists of many datasets (each for a task) ○Probabilistic view - For making it easier to understand the algorithm - Extract prior information, and learning a new task uses this prior and small training set to infer most likely posterior parameters What is the Meta Learning?
  • 12.
    12 • (Conventional) MachineLearning o(Supervised Learning) Given a training dataset , we can train a model parameterized by , by solving, - is a loss function that measures the match between true labels and those predicted by - is the condition to make explicit the dependence of this solution on factors such as choice of optimizer(e.g. Adam, SGD, …) or function class for (e.g. CNN, RNN, …), or initialization parameters(e.g. Xavier, He, …) Formalizing the Meta Learning * is pre-specified, can we learn the learning algorithm itself, rather than assuming it is pre-specified and fixed?
  • 13.
    13 • What isa task? oA task: • Different tasks can vary based on: • different objects • different people • different objectives • different lighting conditions • different languages • … Formalizing the Meta Learning CS330: Deep Multi-Task and Meta-Learning, Tutorial #2: few-shot learning and meta-learning(by W. Zi)
  • 14.
    14 • The badnews oDifferent tasks need to share some structure o If this doesn’t hold, you are better off using single-task learning • The good news o There are many tasks with shared structure! The Structure for Meta Learning CS330: Deep Multi-Task and Meta-Learning transfer! ?
  • 15.
    15 • Meta Learning:Task distribution view o specifies ‘how-to-learn’ and is often evaluated in terms of performance over a distribution of tasks (meta-knowledge, inductive bias) oLearning ‘how-to-learn’ solving by, How to solve Meta Learning? How can we solve this problem in practice?
  • 16.
    16 • The learningprocess: Two-level optimization ○Inner-level; - A new task is presented - An agent tries to quickly learn the associated concepts from the training observations - This quick adaptation is facilitated by knowledge that it has accumulated across earlier tasks at the outer-level ○Outer-level; - Whereas the inner-level concerns a single task, the outer-level concerns a multitude of tasks - In the outer-level, the tasks of the inner-level of one cycle are integrated How to solve Meta Learning?
  • 17.
    17 • Meta-dataset: thedataset of datasets Introduction to Meta-dataset Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. task Held-out classes
  • 18.
    18 • Meta-training (SupervisedLearning) Meta Learning: Overview Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. Query set Support set
  • 19.
    19 • Meta-training (SupervisedLearning) Meta Learning: Overview Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. Query set Support set Learning ‘how-to-learn’ (meta-learning)
  • 20.
    20 • Meta-training (SupervisedLearning) Meta Learning: Overview Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. Query set Support set Learning ‘how-to-learn’ (meta-learning) Methods • Model-based methods • Optimization-based Inference • Non-parametric approach
  • 21.
    21 • Meta-testing (SupervisedLearning) → Few-Shot Learning Meta Learning: Overview Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. Query set Support set Fast adaptation with small data
  • 22.
    22 • Meta-testing (SupervisedLearning) → Few-Shot Learning Meta Learning: Overview Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." 2016. Query set Support set Fast adaptation with small data Finally, we can evaluate the accuracy of our meta-learner by the performance of on the test split of each target task
  • 23.
  • 24.
    24 Benchmark Datasets inMeta Supervised Learning Dataset Description Omniglot (Lake et al., 2011) This dataset presents an image recognition task. Each image corresponds to one out of 1,623 characters from 50 different alphabets. Every character was drawn by 20 people. Note that in this case, the characters are the classes/labels. ImageNet (Deng et al., 2009) This is the largest image classification dataset, containing more than 20K classes and over 14 million colored images. MiniImageNet is a mini variant of the large ImageNet dataset (Deng et al., 2009) for image classification, proposed by Vinyals et al. (2016) to reduce the engineering efforts to run experiments. The mini dataset contains 60,000 colored images of size 84 × 84. There are a total of 100 classes present, each accorded by 600 examples. TieredImageNet (Ren et al., 2018) is another variation of the large ImageNet dataset. It is similar to miniImageNet, but contains a hierarchical structure. That is, there are 34 classes, each with its own sub-classes. CIFAR-10 and CIFAR-100 (Krizhevsky, 2009) Two other image recognition datasets. Each one contains 60K RGB images of size 32×32. CIFAR-10 and CIFAR100 contain 10 and 100 classes respectively, with a uniform number of examples per class (6,000 and 600 respectively). Every class in CIFAR-100 also has a super-class, of which there are 20 in the full data set. Many variants of the CIFAR datasets can be sampled, giving rise to e.g. CIFAR-FS (Bertinetto et al., 2019) and FC-100 (Oreshkin et al., 2018). CUB-200-2011 (Wah et al., 2011) The CUB-200-2011 dataset contains roughly 12K RGB images of birds from 200 species. Every image has some labeled attributes(e.g. crown color, tail shape). MNIST (LeCun et al., 2010) MNIST presents a hand-written digit recognition task, containing ten classes (for digits 0 through 9). In total, the dataset is split into a 60K train and 10K test gray scale images of hand-written digits. Meta-Dataset (Triantafillou et al., 2020) This dataset comprises several other datasets such as Omniglot (Lake et al., 2011), CUB-200 (Wah et al., 2011), ImageNet (Deng et al., 2009), and more (Triantafillou et al., 2020). An episode is then constructed by sampling a dataset (e.g. Omniglot), selecting a subset of labels to create train and test splits as before. In this way, broader generalization is enforced since the tasks are more distant from each other. Meta-world (Yu et al., 2019) A meta reinforcement learning dataset, containing 50 robotic manipulation tasks (control a robot arm to achieve some pre-defined goal, e.g. unlocking a door, or playing soccer). It was specifically designed to cover a broad range of tasks, such that meaningful generalization can be measures (Yu et al., 2019).
  • 25.
    25 • Model-based methods(SL O, RL O) ○ Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on machine learning. 2016. ○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). ○ Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of machine learning research 70 (2017): 2554. • Optimization-based Inference (SL O, RL O) ○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). ○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). ○ Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint arXiv:1807.05960 (2018). • Non-parametric approach (SL O, RL X) ○ Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop. Vol. 2. 2015. ○ Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems. 2016. ○ Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-shot learning." Advances in neural information processing systems. 2017. Meta Learning Algorithms
  • 26.
    26 • Model-based methods(SL O, RL O) ○ Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on machine learning. 2016. ○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). ○ Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of machine learning research 70 (2017): 2554. • Optimization-based Inference (SL O, RL O) ○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). ○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). ○ Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint arXiv:1807.05960 (2018). • Non-parametric approach (SL O, RL X) ○ Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop. Vol. 2. 2015. ○ Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems. 2016. ○ Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-shot learning." Advances in neural information processing systems. 2017. Meta Learning Algorithms
  • 27.
    27 • General approach:Train a recurrent networks to represent Model-based methods CS330: Deep Multi-Task and Meta-Learning Train with standard supervised learning support set query set
  • 28.
    28 • General approach:Train a recurrent networks to represent Model-based methods CS330: Deep Multi-Task and Meta-Learning 1. Sample task (or mini-batch of tasks) 2. Sample disjoint datasets from
  • 29.
    29 • General approach:Train a recurrent networks to represent Model-based methods CS330: Deep Multi-Task and Meta-Learning 1. Sample task (or mini-batch of tasks) 2. Sample disjoint datasets from 3. Compute 4. Update using support set query set
  • 30.
    30 • One ofChallenge: Output parameters does not scalable oNo need to output all parameters of neural net, only sufficient statistics (Santoro et al., MANN, 2016), (Mishra et al., SNAIL, 2017) Model-based methods For example, in multi-task learning, concatenate or multiplication Low-dimensional vector Represents contextual task information CS330: Deep Multi-Task and Meta-Learning
  • 31.
    31 • Memory-Augmented NeuralNetworks Model-based methods: Variants Graves, Alex, et al. "Neural turing machines." arXiv, 2014, Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." ICML 2016.
  • 32.
    32 • A SimpleNeural AttentIve Meta-Learner(SNAIL) Model-based methods: Variants Mishra, Nikhil, et al. "A simple neural attentive meta-learner." ICLR 2018
  • 33.
    33 • Graves, Alex,Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv p reprint arXiv:1410.5401 (2014). • Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). • Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on machine learning. 2016. • Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). • Munkhdalai, Tsendsuren, and Hong Yu. "Meta networks." Proceedings of machine learning research 70 (2017): 2554. • Garnelo, Marta, et al. "Conditional neural processes." arXiv preprint arXiv:1807.01613 (2018). Model-based methods: Further Readings
  • 34.
    34 • Can weutilize the pre-trained prior knowledge? • One successful form of prior knowledge: initialization for fine-tuning • What about Transfer Learning? Optimization-based Inference Can we do better?
  • 35.
    35 • Model-Agnostic MetaLearning Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 36.
    36 • Model-Agnostic MetaLearning Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 37.
    37 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 38.
    38 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning meta-initialization(outer-loop) adaptation(inner-loop) Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 39.
    39 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 40.
    40 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning
  • 41.
    41 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning 1st for loop 2nd for loop Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 42.
    42 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning 1st for loop 2nd for loop Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 43.
    43 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning 1st for loop 2nd for loop Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 44.
    44 • MAML forSupervised Learning(Meta training) Model-Agnostic Meta Learning Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 45.
    45 • MAML forSupervised Learning(Meta testing) Model-Agnostic Meta Learning Few-shot adaptation Fine Tuning(Adaptation) Just 1 or few gradient steps Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 46.
    46 • MAML forSupervised Learning(Meta testing) Model-Agnostic Meta Learning Few-shot adaptation Fine Tuning(Adaptation) Just 1 or few gradient steps Evaluation: N-way k-shot Accuracy Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 47.
    47 • Finn, Chelsea,Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). • Ravi, Sachin, and Hugo Larochelle. "Optimization as a model for few-shot learning." (2016). • Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). • Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." arXiv preprint arXiv:1807.05960 (2018). • Antoniou, Antreas, Harrison Edwards, and Amos Storkey. "How to train your MAML." arXiv preprint arXiv:1810.09502 (2018). • Lee, Yoonho, and Seungjin Choi. "Gradient-based meta-learning with learned layerwise metric and subspace." arXiv preprint arXiv:1801.05558 (2018). • Fallah, Alireza, Aryan Mokhtari, and Asuman Ozdaglar. "On the convergence theory of gradient- based model-agnostic meta-learning algorithms." International Conference on Artificial Intelligence and Statistics. 2020. • Song, Xingyou, et al. "Es-maml: Simple hessian-free meta learning." arXiv preprint arXiv:1910.01215 (2019). • Triantafillou, Eleni, et al. "Meta-dataset: A dataset of datasets for learning to learn from few examples." arXiv preprint arXiv:1903.03096 (2019). Optimization-based Inference: Further Readings
  • 48.
    48 • KNN(K-Nearest Neighbor)Algorithm oLazy, Non-parametric Learner Non-parametric approach Red or Blue?
  • 49.
    49 • KNN(K-Nearest Neighbor)Algorithm oLazy, Non-parametric Learner Non-parametric approach Red or Blue? 1-Nearest Neighbor → Blue 3-Nearset Neighbor → Red 5-Nearset Neighbor → Red
  • 50.
    50 • In lowdata regimes (e.g. 2D data), non-parametric methods are simple, works well! • What about high dimensional data? o During meta-training → Still want to be parametric o During meta-testing → Few-Shot learning ↔ low data regime Non-parametric approach Can we train parametric meta-learners that produce effective non-parametric learners? Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
  • 51.
    51 • At meta-testtime, compare test image(query) with training images (support set) o Which space do we use? o Which distance metric do we use? Non-parametric approach Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
  • 52.
    52 • Siamese NeuralNetworks Non-parametric approach: Variants meta-training meta-testing Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015.
  • 53.
    53 • Siamese NeuralNetworks: A simple example Non-parametric approach: Variants
  • 54.
    54 • Siamese NeuralNetworks: A simple example Non-parametric approach: Variants Label 1
  • 55.
    55 • Siamese NeuralNetworks: A simple example Non-parametric approach: Variants Label 0
  • 56.
    56 • Siamese NeuralNetworks Non-parametric approach: Variants Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015. Same class: Otherwise: Siamese twin is not depicted, but joins immediately after the 4096 unit fully-connected layer where the L1 component-wise distance between vectors is computed
  • 57.
    57 • Siamese NeuralNetworks Non-parametric approach: Variants Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015. Test Accuracy on Omniglot Dataset
  • 58.
    58 • Siamese NeuralNetworks Non-parametric approach: Variants Koch, Gregory, et al. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop, 2015. Test Accuracy on Omniglot Dataset meta-training: 2-way classification meta-testing: N-way classification “A simple machine learning principle: test and train conditions must match” (Vinyals et al., Matching Networks for One-Shot Learning)
  • 59.
    59 • Matching Networksfor One-Shot Learning Non-parametric approach: Variants Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems. 2016. bidirectional LSTM Convolutional encoder Cosine distance
  • 60.
    60 • Prototypical Networks oCanwe aggregate class information to create a prototypical embedding? Non-parametric approach: Variants : Euclidian distance Snell, Jake, et al., "Prototypical networks for few-shot learning." Advances in neural information processing systems. 2017.
  • 61.
    61 • Koch, Gregory,Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop. Vol. 2. 2015. • Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems. 2016. • Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few- shot learning." Advances in neural information processing systems. 2017. • Sung, Flood, et al. "Learning to compare: Relation network for few-shot learning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. • Allen, Kelsey R., et al. "Infinite mixture prototypes for few-shot learning." arXiv preprint arXiv:1902.04552 (2019). • Garcia, Victor, and Joan Bruna. "Few-shot learning with graph neural networks." arXiv preprint arXiv:1711.04043 (2017). Non-parametric approach: Further Readings
  • 62.
  • 63.
    63 • Reinforcement Learning Introductionto Reinforcement Learning Observation 𝑆𝑡 Action 𝐴𝑡 Reward 𝑅𝑡 Agent The World Environment • At each time step 𝑡 the Agent - Executes action 𝐴𝑡 - Receives observation 𝑆𝑡 - Receives scalar reward 𝑅𝑡 • The environment - Receives action 𝐴𝑡 - Emits observation 𝑆𝑡+1 - Emits scalar reward 𝑅𝑡+1 • 𝑡 increments at env.step()
  • 64.
    64 • Reinforcement Learning Introductionto Reinforcement Learning Environment Agent
  • 65.
    65 • Reinforcement Learning Introductionto Reinforcement Learning Environment Agent State 𝑆𝑡
  • 66.
    66 • Reinforcement Learning Introductionto Reinforcement Learning Environment Agent State 𝑆𝑡 Action 𝐴𝑡
  • 67.
    67 • Reinforcement Learning Introductionto Reinforcement Learning Environment State 𝑆𝑡 Action 𝐴𝑡 Reward 𝑅𝑡+1 Next State 𝑆𝑡+1 Agent
  • 68.
    68 • Markov DecisionProcesses (MDPs) ○Mathematical formulation of the RL problem • MDPs defined by ○ : State set ○ : Action set ○ : Transition set ○ : Reward ○ : Discount factor Introduction to Reinforcement Learning
  • 69.
    69 • Markov DecisionProcesses (MDPs) ○Mathematical formulation of the RL problem • MDPs defined by ○ : State set ○ : Action set ○ : Transition set ○ : Reward ○ : Discount factor • (Stochastic) Policies ○Probability of agent’s action: Introduction to Reinforcement Learning Goal: Find Optimal policy Value function
  • 70.
    70 • What’s wrongwith Reinforcement Learning? Problem Statement DQN OpenAI Five AlphaStar AlphaGo
  • 71.
    71 • What’s wrongwith Reinforcement Learning? Problem Statement MDP 1 MDP 2 MDP 3 Can we meta-learn reinforcement learning algorithms that are much more efficient? “Hula Beach”, “Never grow up”, “The Sled” – by artist Matt Spangler, mattspangler.com
  • 72.
    72 • What’s wrongwith Reinforcement Learning? Problem Statement https://bit.ly/bandit-meta-rl-ex Reinforcement Learning Meta Reinforcement Learning
  • 73.
    73 • General Frameworkfor Meta Reinforcement Learning Meta Reinforcement Learning Env 1(MDP 1) Env 2(MDP 2) Env 3(MDP 3) . . . meta-training(10 env) Meta RL Algorithm Env 11(MDP 11) meta-testing (with small interactions) Fast meta-learned Agent
  • 74.
    74 • Recurrent policiesmethods ○ Duan, Yan, et al. "RL2 : Fast reinforcement learning via slow reinforcement learning." arXiv preprint arXiv:1611.02779 (2016). ○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). • Optimization-based Inference ○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). ○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). • POMDPs approach ○ Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." International conference on machine learning. 2019. ○ Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv preprint arXiv:1905.06424 (2019). Meta Reinforcement Learning Algorithms
  • 75.
    75 • Recurrent policiesmethods ○ Duan, Yan, et al. "RL2 : Fast reinforcement learning via slow reinforcement learning." arXiv preprint arXiv:1611.02779 (2016). ○ Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). • Optimization-based Inference ○ Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). ○ Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). • POMDPs approach ○ Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." International conference on machine learning. 2019. ○ Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv preprint arXiv:1905.06424 (2019). Meta Reinforcement Learning Algorithms
  • 76.
    76 • General approach:Just train a recurrent network! Recurrent policies methods RNN hidden state is not reset between episodes! Finn, Chelsea, Sergey Levine. “Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning." ICML Tutorial 2019
  • 77.
    77 • RL2: FastReinforcement Learning via Slow Reinforcement Learning Recurrent policies methods: Variants Duan, Yan, et al. "RL2 : Fast reinforcement learning via slow reinforcement learning." ICLR 2016. (Vanilla) Reinforcement Learning: Optimize episode by episode RL Algorithm: TRPO+GAE
  • 78.
    78 • RL2: FastReinforcement Learning via Slow Reinforcement Learning Recurrent policies methods: Variants Duan, Yan, et al. "RL2 : Fast reinforcement learning via slow reinforcement learning." ICLR 2016. Meta Reinforcement Learning: Optimize trial by trial preserved initialization : Sample environment : 𝑘’th episode in environment RL Algorithm: TRPO+GAE
  • 79.
    79 • A SimpleNeural AttentIve Meta-Learner(SNAIL) Recurrent policies methods: Variants Mishra, Nikhil, et al. "A simple neural attentive meta-learner." ICLR 2018 & https://www.programmersought.com/article/1715257628 / Like RL2 but: replace the LSTM with dilated temporal convolution (like wavenet) + attention dilated temporal convolution
  • 80.
    80 • Botvinick, Matthew,et al. "Reinforcement learning, fast and slow." Trends i n cognitive sciences 23.5 (2019): 408-422. • Wang, Jane X., et al. "Learning to reinforcement learn." arXiv preprint arXi v:1611.05763 (2016). • Duan, Yan, et al. "RL2: Fast reinforcement learning via slow reinforcement learning." arXiv preprint arXiv:1611.02779 (2016). • Mishra, Nikhil, et al. "A simple neural attentive meta-learner." arXiv preprint arXiv:1707.03141 (2017). • Ritter, Samuel, et al. "Been there, done that: Meta-learning with episodic recall." arXiv preprint arXiv:1805.09692 (2018). Recurrent policies methods: Further Readings
  • 81.
    81 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 82.
    82 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 83.
    83 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017 meta-initialization(outer-loop) adaptation(inner-loop)
  • 84.
    84 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 85.
    85 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017 …
  • 86.
    86 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017 Roll out
  • 87.
    87 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 88.
    88 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017 Roll out
  • 89.
    89 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017
  • 90.
    90 • Model-Agnostic MetaLearning oAlso can be applied to Reinforcement Learning! Optimization-based Inference Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." ICML 2017 Few-shot adaptation Fine Tuning(Adaptation) Just 1 or few gradient steps Evaluation: Cumulative Rewards
  • 91.
    91 • MAML-RL Demo(meta-test) Optimization-basedInference https://sites.google.com/view/maml
  • 92.
    92 • MAML-RL Demo(meta-test) Optimization-basedInference https://sites.google.com/view/maml
  • 93.
    93 • Finn, Chelsea,Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017). • Nichol, Alex, Joshua Achiam, and John Schulman. "On first-order meta-learning algorithms." arXiv preprint arXiv:1803.02999 (2018). • Xu, Zhongwen, Hado P. van Hasselt, and David Silver. "Meta-gradient reinforcement learning." Advances in neural information processing systems. 2018. • Houthooft, Rein, et al. "Evolved policy gradients." Advances in Neural Information Processing Systems. 2018. • Salimans, Tim, et al. "Evolution strategies as a scalable alternative to reinforcement learning." arXiv preprint arXiv:1703.03864 (2017). • Rothfuss, Jonas, et al. "Promp: Proximal meta-policy search." arXiv preprint arXiv:1810.06784 (2018). • Fernando, Chrisantha, et al. "Meta-learning by the baldwin effect." Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2018. Optimization-based Inference: Further Readings
  • 94.
    94 • What isPartially Observable Markov Decision Processes(POMDPs)? • A POMDP is an MDP with hidden states(hidden Markov model with actions) • A POMDP is a tuple ○ is a finite set of states ○ is a finite set of actions ○ is a finite set of observations ○ is a state transition probability matrix: ○ is a reward function: ○ is an observation function: ○ is a discount factor: • A History is a sequence of actions, observations, and rewards POMDPs approach
  • 95.
    95 • A beliefstate is a probability distribution over states, conditioned on history POMDPs approach Environment Belief state Example of observation • Number of adjacent walls • A noisy version? - Wrong value with probability 0.1
  • 96.
    96 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved state: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning Initial State A simple Gridworld (Only 𝑠3 gives positive reward) Where am I? 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3
  • 97.
    97 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved state: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning Initial State Action! A simple Gridworld (Only 𝑠3 gives positive reward) Where am I? 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0 ?
  • 98.
    98 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved state: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning Initial State Action! A simple Gridworld (Only 𝑠3 gives positive reward) Where am I? 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑠1 𝑠2 𝑠3 𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
  • 99.
    99 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved task: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning 𝑠1 𝑠2 𝑠3 What task am I in? MDP1 MDP2 MDP3 A simple Gridworld (each task gives positive reward in different state) MDP1 MDP2 MDP3 Initial State
  • 100.
    100 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved task: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning 𝑠1 𝑠2 𝑠3 What task am I in? MDP1 MDP2 MDP3 A simple Gridworld (each task gives positive reward in different state) MDP1 MDP2 MDP3 𝑠1 𝑠2 𝑠3 MDP1 MDP2 MDP3 Initial State Action! 𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0 ?
  • 101.
    101 • Meta ReinforcementLearning as POMDPs • POMDP for unobserved task: A simple example POMDPs approach Motivated by CS330: Deep Multi-Task and Meta-Learning 𝑠1 𝑠2 𝑠3 What task am I in? MDP1 MDP2 MDP3 A simple Gridworld (each task gives positive reward in different state) MDP1 MDP2 MDP3 𝑠1 𝑠2 𝑠3 MDP1 MDP2 MDP3 Initial State Action! 𝑎 = “left”, 𝑠 = 𝑠1, 𝑟 = 0
  • 102.
    102 • Meta ReinforcementLearning as Partially Observable RL ○Control-Inference duality problem ○ → everything needed to solve the task • So, what does that mean? ○Learning a task = inferring ○Encapsulated information policy must solve the current task! • General recipe 1. Sample 2. Act according to to collect more data POMDPs approach some approximate posterior (e.g. variational inference) act as though was correct
  • 103.
    103 • Key Idea:How can we estimate Task Belief States? 1. Supervised Learning (if available) - Can we use task labels in meta-training (not adaptation phase)? 2. Variational approach - 𝑝(𝑧|𝑐) is intractable, can we use variational inference in meta RL? POMDPs approach: Variants
  • 104.
    104 • An agentconsists of two modules; ○The optimal meta-learner only needs to make decisions based on current state and the current belief 1. : the policy dependent on the current state and (an approximate representation of) posterior 2. The belief module: learns to output an approximate representation POMDPs approach: Auxiliary Supervised Learning But how can we estimate the ? Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019
  • 105.
    105 • Some examplesof task labels ○Multi-armed bandit - A vector of arm probabilities ○Semi-circle - An angle of semi-circle(e.g. max = 2𝜋) ○Cheetah velocity - Target velocity POMDPs approach: Auxiliary Supervised Learning Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019
  • 106.
    106 • Meta ReinforcementLearning as Task Inference POMDPs approach: Auxiliary Supervised Learning Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv 2019 A. A baseline LSTM agent B. A belief network agent C. Auxiliary head agent • LSTMs and information bottleneck(IB) is optional • RL algorithms - SVG(0)(Nicolas, Heess, et al., 2015) as off-policy - PPO(John, Schulman, et al., 2017) as on-policy
  • 107.
    107 • Can weachieve both meta-training and adaptation efficiency? - Yes! we can make an off-policy meta-RL algorithm that disentangles task inference and control ○Meta-training efficiency: Off-policy RL algorithm(e.g. SAC) ○Adaptation efficiency: Probabilistic encoder to estimate task-belief POMDPs approach: Variational approach Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019.
  • 108.
    108 • Objective ofvariational approach POMDPs approach: Variational approach Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019. Don’t need to know the order of transitions in order to identify the MDP(Markov property) Use a permutation-invariant encoder for simplicity and speed(No need to use RNN) Regularization(Information bottleneck) Likelihood(Bellman error) Reparameterization trick
  • 109.
    109 • PEARL(Probabilistic Embeddingsfor Actor-critic meta RL) POMDPs approach: Variational approach Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." ICML 2019.
  • 110.
    110 • Osband, Ian,Daniel Russo, and Benjamin Van Roy. "(More) efficient reinforcement learning via posterior sampling." Advances in Neural Information Processing Systems. 2013. • Rakelly, Kate, et al. "Efficient off-policy meta-reinforcement learning via probabilistic context variables." International conference on machine learning. 2019. • Humplik, Jan, et al. "Meta reinforcement learning as task inference." arXiv preprint arXiv:1905.06424 (2019). • Zintgraf, Luisa, et al. "Variational Task Embeddings for Fast Adaption in Deep Reinforcement Learning." International Conference on Learning Representations Workshop on Structure & Priors in Reinforcement Learning. 2019. • Zintgraf, Luisa, et al. "VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning." arXiv preprint arXiv:1910.08348 (2019). POMDPs approach: Further Readings
  • 111.
    Thank you foryour attention!