1 / 15
Meta Learning Shared Hierarchies
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
December 19, 2017
2 / 15
Motivation
0
Taken from Pieter Abbeel’s 2017 NIPS RL keynote
3 / 15
Motivation
Hierarchical RL is required for complex, temporally extended
environments.
Requires a good set of sub-policies.
Idea: Learn over a set of tasks, letting all agents share some
weights. Shared weights should learn a generally applicable set
of sub-policies.
4 / 15
Meta-Learning Shared Hierarchies
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
5 / 15
Architecture
6 / 15
Architecture
7 / 15
Architecture
8 / 15
Architecture
9 / 15
Algorithm
10 / 15
Experiments
Moving Bandits
Hopes for:
High level: bandit problem
Subpolicies: move toward (blue, green) target.
T = 50, N = 10
11 / 15
Experiments
Moving Bandits
12 / 15
Experiments
Maze Navigation
T = 1000, N = 200
13 / 15
Experiments
Maze Navigation
14 / 15
Experiments
Transfer to Unsolvable env
T = 2000, N = 200
15 / 15
Discussion
Loose connection to meta-learning.
Do we need discrete sub-policies?
N is a hyperparameter. Can we somehow learn this using the
set of tasks?
16 / 15
Thank You

Meta Learning Shared Hierarchies