Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2
*, Ignasi Clavera2
, Thanard Kurutach2
,
Jinwoo Shin1
and Pieter Abbeel2
KAIST1
, UC Berkeley2
*Equal Contribution
https://sites.google.com/view/trajectory-mcl
Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deployment
Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
● Adaptive planning
Use the most accurate prediction head
over a recent experience for planning
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment
Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.
Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper
Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!
Comparative Evaluation
● Superior generalization performance on unseen 6 environments
Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choice learning
○ Adaptive planning
Thank you!

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

  • 1.
    Trajectory-wise Multiple ChoiceLearning for Dynamics Generalization in Reinforcement Learning Younggyo Seo1 *, Kimin Lee2 *, Ignasi Clavera2 , Thanard Kurutach2 , Jinwoo Shin1 and Pieter Abbeel2 KAIST1 , UC Berkeley2 *Equal Contribution https://sites.google.com/view/trajectory-mcl
  • 2.
    Problem: Dynamics Generalization ●Model-based RL suffers from dynamics generalization problem Evaluation Training Deployment
  • 3.
    Problem: Dynamics Generalization ●Multi-modal distribution of transition dynamics
  • 4.
    Main Components ● Mainidea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads
  • 5.
    Main Components ● Mainidea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization
  • 6.
    Main Components ● Mainidea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization ● Adaptive planning Use the most accurate prediction head over a recent experience for planning
  • 7.
    Trajectory-wise Multiple ChoiceLearning ● For MCL, each prediction head should receive distinct training samples Transitions Which prediction head is most accurate over these transitions?
  • 8.
    Trajectory-wise Multiple ChoiceLearning ● For MCL, each prediction head should receive distinct training samples Trajectory segment ● Trajectory-wise multiple choice learning Difference in dynamics is more distinctively captured by considering prediction error over trajectory segment
  • 9.
    Context-conditional Multi-headed DynamicsModel ● We also introduce context encoder for online adaptation to unseen environments ● Context encoder g captures contextual information from past experience ● See [Lee’20] for more information [Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In ICML. 2020.
  • 10.
    Analysis on Trajectory-wiseMCL Transitions Trajectory segment ● Specialization leads to superior generalization performance Hopper
  • 11.
    Analysis on AdaptivePlanning ● Qualitative analysis ○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0] [Mass: 1.0] with prediction heads specialized for [Mass: 2.5] [Mass: 2.5] with prediction heads specialized for [Mass: 2.5] Agent acts as if it has a heavyweight body!
  • 12.
    Comparative Evaluation ● Superiorgeneralization performance on unseen 6 environments
  • 13.
    Conclusion ● For dynamicsgeneralization ○ Context-conditional multi-headed dynamics model ○ Trajectory-wise multiple choice learning ○ Adaptive planning Thank you!