Advertisement

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Jul. 6, 2020
Advertisement

More Related Content

Similar to Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)(20)

Advertisement
Advertisement

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

  1. Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin https://sites.google.com/view/cadm*Equal Contribution
  2. Model-based Reinforcement Learning ● Model-based reinforcement learning (RL) ○ Learning a model of environment, i.e., transition dynamics (and reward) ● Advantages Control via planning Sample-efficient learning
  3. Model-based RL works! ● Recent success of model-based reinforcement learning MuZero [1] Dreamer [2] [1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019 [2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
  4. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3]
  5. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] No information of length!
  6. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations
  7. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations “Context-awareness!”
  8. Context-aware Dynamics Model ● What is context & How can it help?
  9. Context-aware Dynamics Model ● What is context & How can it help?
  10. Context-aware Dynamics Model ● What is context & How can it help?
  11. Context-aware Dynamics Model ● What is context & How can it help?
  12. Context-aware Dynamics Model ● What is context & How can it help? How do we extract context information from past experiences?
  13. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference
  14. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that outputs a context latent vector
  15. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector
  16. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector Challenge: how to encode more meaningful information of dynamics?
  17. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future
  18. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future ● Backward prediction Predict backward transitions
  19. Context-aware Dynamics Model ● Final loss function ● Model-agnostic!
  20. Ablation Study Effects of prediction loss Vanilla dynamics model (DM) : No context learning Vanilla DM + context learning with one-step forward Vanilla DM + context learning with future-step forward Vanilla DM + context learning with future-step forward & backward
  21. CaDM is Model-agnostic ● Prediction error for Half-Cheetah with varying body masses Vanilla DM PE-TS [4] [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  22. Embedding Analysis ● Contexts from similar environments are clustered together
  23. Embedding Analysis ● Contexts from similar environments are clustered together
  24. Embedding Analysis ● Contexts from similar environments are clustered together
  25. Embedding Analysis ● Contexts from similar environments are clustered together
  26. Embedding Analysis ● Contexts from similar environments are clustered together
  27. ● 10 past transitions and 20 future predictions Prediction Visualization
  28. ● 10 past transitions and 20 future predictions Prediction Visualization
  29. ● 10 past transitions and 20 future predictions Prediction Visualization
  30. ● 10 past transitions and 20 future predictions Prediction Visualization
  31. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
  32. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
  33. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019. ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] ● PPO + CaDM ○ Conditioning policy and value networks on learned latent vector
  34. ● We evaluate the generalization performance in two regimes ○ Moderate ○ Extreme Experimental Setup: Environments
  35. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  36. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  37. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  38. Model-free RL: HalfCheetah [9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019. [10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
  39. Conclusion ● For dynamics generalization, ○ We propose a context-aware dynamics model ○ Novel loss function for context learning ● Code is available at ● https://github.com/younggyoseo/CaDM https://sites.google.com/view/cadm Thank you!
Advertisement