Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)
Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning
Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin
https://sites.google.com/view/cadm*Equal Contribution
Model-based Reinforcement Learning
● Model-based reinforcement learning (RL)
○ Learning a model of environment, i.e., transition dynamics (and reward)
● Advantages
Control via planning Sample-efficient learning
Model-based RL works!
● Recent success of model-based reinforcement learning
MuZero [1] Dreamer [2]
[1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019
[2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
No information of length!
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
“Context-awareness!”
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that outputs
a context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Challenge: how to encode more meaningful information of dynamics?
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
● Backward prediction
Predict backward transitions
Ablation Study
Effects of prediction loss
Vanilla dynamics model (DM)
: No context learning
Vanilla DM + context learning
with one-step forward
Vanilla DM + context learning
with future-step forward
Vanilla DM + context learning
with future-step forward & backward
CaDM is Model-agnostic
● Prediction error for Half-Cheetah with varying body masses
Vanilla DM PE-TS [4]
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
● PPO + CaDM
○ Conditioning policy and value
networks on learned latent vector
● We evaluate the generalization performance in two regimes
○ Moderate
○ Extreme
Experimental Setup: Environments
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-free RL: HalfCheetah
[9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019.
[10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
Conclusion
● For dynamics generalization,
○ We propose a context-aware dynamics model
○ Novel loss function for context learning
● Code is available at
● https://github.com/younggyoseo/CaDM
https://sites.google.com/view/cadm
Thank you!