SlideShare a Scribd company logo
Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning
Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin
https://sites.google.com/view/cadm*Equal Contribution
Model-based Reinforcement Learning
● Model-based reinforcement learning (RL)
○ Learning a model of environment, i.e., transition dynamics (and reward)
● Advantages
Control via planning Sample-efficient learning
Model-based RL works!
● Recent success of model-based reinforcement learning
MuZero [1] Dreamer [2]
[1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019
[2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
No information of length!
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
“Context-awareness!”
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
How do we extract
context information
from past experiences?
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that outputs
a context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Challenge: how to encode more meaningful information of dynamics?
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
● Backward prediction
Predict backward transitions
Context-aware Dynamics Model
● Final loss function
● Model-agnostic!
Ablation Study
Effects of prediction loss
Vanilla dynamics model (DM)
: No context learning
Vanilla DM + context learning
with one-step forward
Vanilla DM + context learning
with future-step forward
Vanilla DM + context learning
with future-step forward & backward
CaDM is Model-agnostic
● Prediction error for Half-Cheetah with varying body masses
Vanilla DM PE-TS [4]
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
● PPO + CaDM
○ Conditioning policy and value
networks on learned latent vector
● We evaluate the generalization performance in two regimes
○ Moderate
○ Extreme
Experimental Setup: Environments
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-free RL: HalfCheetah
[9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019.
[10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
Conclusion
● For dynamics generalization,
○ We propose a context-aware dynamics model
○ Novel loss function for context learning
● Code is available at
● https://github.com/younggyoseo/CaDM
https://sites.google.com/view/cadm
Thank you!

More Related Content

What's hot

Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
NAVER Engineering
 
画像生成AI stable diffusionの紹介
画像生成AI stable diffusionの紹介画像生成AI stable diffusionの紹介
画像生成AI stable diffusionの紹介
iPride Co., Ltd.
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
Deep Learning JP
 
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
Yamato OKAMOTO
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
YONG ZHENG
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
Yoshitaka Ushiku
 
AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦
Koichi Hamada
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
最近のディープラーニングのトレンド紹介_20200925
最近のディープラーニングのトレンド紹介_20200925最近のディープラーニングのトレンド紹介_20200925
最近のディープラーニングのトレンド紹介_20200925
小川 雄太郎
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
Naoaki Okazaki
 
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
The Whole Brain Architecture Initiative
 
Graph Attention Network
Graph Attention NetworkGraph Attention Network
Graph Attention Network
Takahiro Kubo
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee
 
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
Deep Learning JP
 
Chapter4 1 takmin
Chapter4 1 takminChapter4 1 takmin
Chapter4 1 takmin
Takuya Minagawa
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
Bhaskar Mitra
 

What's hot (20)

Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
 
画像生成AI stable diffusionの紹介
画像生成AI stable diffusionの紹介画像生成AI stable diffusionの紹介
画像生成AI stable diffusionの紹介
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
 
AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
最近のディープラーニングのトレンド紹介_20200925
最近のディープラーニングのトレンド紹介_20200925最近のディープラーニングのトレンド紹介_20200925
最近のディープラーニングのトレンド紹介_20200925
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
 
Graph Attention Network
Graph Attention NetworkGraph Attention Network
Graph Attention Network
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
[DL輪読会]Factorized Variational Autoencoders for Modeling Audience Reactions to...
 
Chapter4 1 takmin
Chapter4 1 takminChapter4 1 takmin
Chapter4 1 takmin
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 

Similar to Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy Learning
Sungjoon Choi
 
Southwest Airlines Strategy and Process Response Guid.docx
Southwest Airlines Strategy and Process Response Guid.docxSouthwest Airlines Strategy and Process Response Guid.docx
Southwest Airlines Strategy and Process Response Guid.docx
williame8
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in Robotics
Sungjoon Choi
 
Williams, Sandi - Theory Analysis & Presentation
Williams, Sandi - Theory Analysis & PresentationWilliams, Sandi - Theory Analysis & Presentation
Williams, Sandi - Theory Analysis & Presentation
SandiLynnWilliams
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Yoonho Lee
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
Andre Freitas
 
Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as
BHANU281672
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside Out
Mara Graziani
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Bill Liu
 
Vera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpointVera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpoint
Vera Weber
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Niki Pavlopoulou
 
Qualitative approaches to learning analytics
Qualitative approaches to learning analyticsQualitative approaches to learning analytics
Qualitative approaches to learning analytics
Rebecca Ferguson
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
Sebastian Ruder
 
Boundary spanning leadership slideshare
Boundary spanning leadership slideshareBoundary spanning leadership slideshare
Boundary spanning leadership slideshare
Kelly Trusty
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
Yoonho Lee
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
mooopan
 
Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...
PeterHackbert
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal
 
Achterman csla 2011reading_online
Achterman csla 2011reading_onlineAchterman csla 2011reading_online
Achterman csla 2011reading_online
dachterman
 
Csla presentation reading online 2011
Csla presentation reading online 2011Csla presentation reading online 2011
Csla presentation reading online 2011
dachterman
 

Similar to Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020) (20)

Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy Learning
 
Southwest Airlines Strategy and Process Response Guid.docx
Southwest Airlines Strategy and Process Response Guid.docxSouthwest Airlines Strategy and Process Response Guid.docx
Southwest Airlines Strategy and Process Response Guid.docx
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in Robotics
 
Williams, Sandi - Theory Analysis & Presentation
Williams, Sandi - Theory Analysis & PresentationWilliams, Sandi - Theory Analysis & Presentation
Williams, Sandi - Theory Analysis & Presentation
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside Out
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Vera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpointVera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpoint
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
 
Qualitative approaches to learning analytics
Qualitative approaches to learning analyticsQualitative approaches to learning analytics
Qualitative approaches to learning analytics
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Boundary spanning leadership slideshare
Boundary spanning leadership slideshareBoundary spanning leadership slideshare
Boundary spanning leadership slideshare
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Achterman csla 2011reading_online
Achterman csla 2011reading_onlineAchterman csla 2011reading_online
Achterman csla 2011reading_online
 
Csla presentation reading online 2011
Csla presentation reading online 2011Csla presentation reading online 2011
Csla presentation reading online 2011
 

More from ALINLAB

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
ALINLAB
 
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
ALINLAB
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learning
ALINLAB
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
ALINLAB
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
ALINLAB
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
ALINLAB
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
ALINLAB
 

More from ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
 
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learning
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
 

Recently uploaded

Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 

Recently uploaded (20)

Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

  • 1. Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin https://sites.google.com/view/cadm*Equal Contribution
  • 2. Model-based Reinforcement Learning ● Model-based reinforcement learning (RL) ○ Learning a model of environment, i.e., transition dynamics (and reward) ● Advantages Control via planning Sample-efficient learning
  • 3. Model-based RL works! ● Recent success of model-based reinforcement learning MuZero [1] Dreamer [2] [1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019 [2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
  • 4. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3]
  • 5. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] No information of length!
  • 6. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations
  • 7. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations “Context-awareness!”
  • 8. Context-aware Dynamics Model ● What is context & How can it help?
  • 9. Context-aware Dynamics Model ● What is context & How can it help?
  • 10. Context-aware Dynamics Model ● What is context & How can it help?
  • 11. Context-aware Dynamics Model ● What is context & How can it help?
  • 12. Context-aware Dynamics Model ● What is context & How can it help? How do we extract context information from past experiences?
  • 13. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference
  • 14. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that outputs a context latent vector
  • 15. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector
  • 16. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector Challenge: how to encode more meaningful information of dynamics?
  • 17. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future
  • 18. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future ● Backward prediction Predict backward transitions
  • 19. Context-aware Dynamics Model ● Final loss function ● Model-agnostic!
  • 20. Ablation Study Effects of prediction loss Vanilla dynamics model (DM) : No context learning Vanilla DM + context learning with one-step forward Vanilla DM + context learning with future-step forward Vanilla DM + context learning with future-step forward & backward
  • 21. CaDM is Model-agnostic ● Prediction error for Half-Cheetah with varying body masses Vanilla DM PE-TS [4] [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 22. Embedding Analysis ● Contexts from similar environments are clustered together
  • 23. Embedding Analysis ● Contexts from similar environments are clustered together
  • 24. Embedding Analysis ● Contexts from similar environments are clustered together
  • 25. Embedding Analysis ● Contexts from similar environments are clustered together
  • 26. Embedding Analysis ● Contexts from similar environments are clustered together
  • 27. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 28. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 29. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 30. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 31. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
  • 32. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
  • 33. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019. ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] ● PPO + CaDM ○ Conditioning policy and value networks on learned latent vector
  • 34. ● We evaluate the generalization performance in two regimes ○ Moderate ○ Extreme Experimental Setup: Environments
  • 35. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 36. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 37. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 38. Model-free RL: HalfCheetah [9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019. [10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
  • 39. Conclusion ● For dynamics generalization, ○ We propose a context-aware dynamics model ○ Novel loss function for context learning ● Code is available at ● https://github.com/younggyoseo/CaDM https://sites.google.com/view/cadm Thank you!