Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2
*, Ignasi Clavera2
, Thanard Kurutach2
,
Jinwoo Shin1
and Pieter Abbeel2
KAIST1
, UC Berkeley2
*Equal Contribution
https://sites.google.com/view/trajectory-mcl

Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deployment

Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics

Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads

Main Components
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization

Main Components
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
● Adaptive planning
Use the most accurate prediction head
over a recent experience for planning

Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?

Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment

Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.

Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper

Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!

Comparative Evaluation
● Superior generalization performance on unseen 6 environments

Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choice learning
○ Adaptive planning
Thank you!

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

More Related Content

Recently uploaded

Featured

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning