PR-153: SNAIL: A Simple Neural Attentive Meta-Learner

A Simple Neural AttentIve Meta-
Learner
PR-153
Mar 31, 2019
Taekmin Kim
1

Machine Learning vs. Human
● Machine Learning
○ Try to learn data points
■ Supervised/Unsupervised Learning
■ Reinforcement Learning
● Human
○ Fast adaptation with prior knowledge
■ Few-shot learning
■ Generalization across tasks
3

● Related Work
○ LEARNING TO REINFORCEMENT LEARN
○ RL^2
○ MAML
○ Auto-Meta
Meta-Learner?
Multi-armed bandit problem
https://blog.floydhub.com/meta-rl/
4

Meta-RL
● Goal: Generalization across tasks
● Notations
○ T: Task distribution e.g., driving, multi-armed bandit problems
○ T_i: Specific task e.g., Sonata, Porsche, ...
○ x_t: state
○ a_t: action
5

RNN-based Meta-RL(Agent)
● sequence-to-sequence problem
○ refer to past experience
● Drawbacks:
○ Temporally-linear dependency
https://blog.floydhub.com/meta-rl/
6

Motivation
● Temporal(Causal) Convolution
○ depends on previous steps
● Soft Attention
○ weighted sum
https://www.slideshare.net/ThomasHjeldeThoresen/temporal-convolutional-networks-dethroning-rnns-for-sequence-modelling
https://medium.com/syncedreview/memory-attention-sequences-8522f531dd43
7

Temporal Convolution
Vanilla 1D TCs
(exponential) Dilated 1D TCs
Vanilla 1D Convolution Temporal Convolution(TC)
8

Attention is All you Need(2017)
https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/#.XJ6U6-szZ0c
https://medium.com/@hyponymous/paper-summary-attention-is-all-you-need-22c2c7a5e06
Q: Hidden State of Decoder
K: Hidden State of Encoder
V: (normalized) Weights
9
PR-049: https://www.youtube.com/watch?v=6zGgVIlStXs

Motivation
● Temporal(Causal) Convolution
○ depends on previous steps
● Soft Attention
○ weighted sum
https://medium.com/syncedreview/memory-attention-sequences-8522f531dd43
10

Simple Neural AttentIve Learner
Building Blocks
● DenseBlock
● TCBlock
● AttentionBlock
11

Attention Block
Query: Hidden State of Decoder
Key: Hidden State of Encoder
Value: (normalized) Weights
14

Simple Neural AttentIve Learner
Building Blocks
● DenseBlock
● TCBlock
● AttentionBlock
16

Experiments
● Supervised Learning
○ Few-Shot Learning(Image Classification)
■ n-Way DATASET
■ m-shot
● Reinforcement Learning
○ Multi-Armed Bandits
○ Tabular MDPs
○ Continuous Control
○ Visual Navigation
17

Results: Few-shot Learning
MAML
Omniglot
18

MAML: Optimization-based Meta-RL
20
https://arxiv.org/abs/1703.03400
PR-094: MAML https://www.youtube.com/watch?v=fxJXXKZb-ik

Results: Multi-armed Bandits
MAML
21

Results: Tabular MDPs
23
가깝고도 먼 TRPO(이웅원 님)
https://www.slideshare.net/WoongwonLee/trpo-87165690

Summary
26
● SNAIL
○ Temporal Convolution
○ Soft Attention
● Meta-RL is promising
● Related Work
○ LEARNING TO REINFORCEMENT LEARN
○ RL^2
○ MAML
○ Auto-Meta
● Materials
○ Meta-RL(Chelsea Finn): http://rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-20.pdf

PR-153: SNAIL: A Simple Neural Attentive Meta-Learner

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Recently uploaded

Recently uploaded (20)

PR-153: SNAIL: A Simple Neural Attentive Meta-Learner