SlideShare a Scribd company logo
1 of 13
Download to read offline
Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2
*, Ignasi Clavera2
, Thanard Kurutach2
,
Jinwoo Shin1
and Pieter Abbeel2
KAIST1
, UC Berkeley2
*Equal Contribution
https://sites.google.com/view/trajectory-mcl
Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deployment
Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads
● Multiple choice learning (MCL)
Update the most accurate prediction
head for specialization
● Adaptive planning
Use the most accurate prediction head
over a recent experience for planning
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment
Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.
Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper
Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!
Comparative Evaluation
● Superior generalization performance on unseen 6 environments
Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choice learning
○ Adaptive planning
Thank you!

More Related Content

Similar to Trajectory-wise MCL for Dynamics Generalization

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Aalto University
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningMLAI2
 
State Representation Learning for control: an overview
State Representation Learning for control: an overviewState Representation Learning for control: an overview
State Representation Learning for control: an overviewNatalia Díaz Rodríguez
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...AutonomyIncubator
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGLokesh147875
 
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...AgileNetwork
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71
 
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat
 

Similar to Trajectory-wise MCL for Dynamics Generalization (8)

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
State Representation Learning for control: an overview
State Representation Learning for control: an overviewState Representation Learning for control: an overview
State Representation Learning for control: an overview
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
 
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
 
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
 

More from ALINLAB

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...ALINLAB
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningALINLAB
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...ALINLAB
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)ALINLAB
 
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...ALINLAB
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)ALINLAB
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)ALINLAB
 

More from ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learning
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Trajectory-wise MCL for Dynamics Generalization

  • 1. Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning Younggyo Seo1 *, Kimin Lee2 *, Ignasi Clavera2 , Thanard Kurutach2 , Jinwoo Shin1 and Pieter Abbeel2 KAIST1 , UC Berkeley2 *Equal Contribution https://sites.google.com/view/trajectory-mcl
  • 2. Problem: Dynamics Generalization ● Model-based RL suffers from dynamics generalization problem Evaluation Training Deployment
  • 3. Problem: Dynamics Generalization ● Multi-modal distribution of transition dynamics
  • 4. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads
  • 5. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization
  • 6. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization ● Adaptive planning Use the most accurate prediction head over a recent experience for planning
  • 7. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Transitions Which prediction head is most accurate over these transitions?
  • 8. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Trajectory segment ● Trajectory-wise multiple choice learning Difference in dynamics is more distinctively captured by considering prediction error over trajectory segment
  • 9. Context-conditional Multi-headed Dynamics Model ● We also introduce context encoder for online adaptation to unseen environments ● Context encoder g captures contextual information from past experience ● See [Lee’20] for more information [Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In ICML. 2020.
  • 10. Analysis on Trajectory-wise MCL Transitions Trajectory segment ● Specialization leads to superior generalization performance Hopper
  • 11. Analysis on Adaptive Planning ● Qualitative analysis ○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0] [Mass: 1.0] with prediction heads specialized for [Mass: 2.5] [Mass: 2.5] with prediction heads specialized for [Mass: 2.5] Agent acts as if it has a heavyweight body!
  • 12. Comparative Evaluation ● Superior generalization performance on unseen 6 environments
  • 13. Conclusion ● For dynamics generalization ○ Context-conditional multi-headed dynamics model ○ Trajectory-wise multiple choice learning ○ Adaptive planning Thank you!