SlideShare a Scribd company logo
1 of 22
Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry Taewoong Um
DEEP REINFORCEMENT LEARNING IN A
HANDFUL OF TRIALS USING PROBABIL-
ISTIC DYNAMICS MODELS
1
Terry Taewoong Um (terry.t.um@gmail.com)
2
NIPS 2018
REINFORCEMENT LEARNING IS HOT
Terry Taewoong Um (terry.t.um@gmail.com)
(Pictures from Karpathy’s blog)
• Baselines
(https://www.cs.ubc.ca/~gberseth/blog/demystifying-the-
many-deep-reinforcement-learning-algorithms.html)
3
WHAT IS THE PROBLEM?
Terry Taewoong Um (terry.t.um@gmail.com)
Gu and Holly et al., “Deep Reinforcement Learning for Robotic
Manipulation with Asynchronous Off-Policy Updates”, 2016.
• RL requires a lot of data
- Rewards in RL give more indirect
information than labels in
supervised learning
• RL is not generalize well in new
tasks / environments
- Meta learning
• RL have been used for robotics
before the era of deep RL
- RL with Gaussian process
4
MODEL-FREE VS. MODEL-BASED
Terry Taewoong Um (terry.t.um@gmail.com)
5
Model
Performance : Model-free RL > Model-based RL
Data efficiency : Model-free RL < Model-based RL
(MLSS2017, Jan Peters)
GP MODEL VS NN MODEL
Terry Taewoong Um (terry.t.um@gmail.com)
6
Learning speed : GP model > NN model
For small data : GP model > NN model
Capacity : GP model < NN model
For large data : GP model < NN model
Q) How can we make a NN-model-based RL with less weaknesses?
In other words, how can we make a NN-model-based RL which is
also good for small data?
Terry Taewoong Um (terry.t.um@gmail.com)
7
ICML 2018
https://sites.google.com/view/mbmf
NN-MODEL-BASED RL
Terry Taewoong Um (terry.t.um@gmail.com)
8
• How can we choose the optimal actions with a learned model ?
• What is model predictive control (MPC)?
TRAINING
Terry Taewoong Um (terry.t.um@gmail.com)
9
• Training the model
• Choose the optimal policy
NN-MODEL-BASED RL
Terry Taewoong Um (terry.t.um@gmail.com)
10
Initialize the model with MBRL
and fine-tune with MFRL
Terry Taewoong Um (terry.t.um@gmail.com)
11
NIPS 2018
ICML 2018
UNCERTAINTY IN DL
Terry Taewoong Um (terry.t.um@gmail.com)
12
• Two types of uncertainty :
aleatoric (w/ data) & epistemic (w/o data) uncertainty
UNCERTAINTY IN DL
Terry Taewoong Um (terry.t.um@gmail.com)
13
ALEATORIC: PROBABILISTIC NN (P)
Terry Taewoong Um (terry.t.um@gmail.com)
14
• Probabilistic NN (P)
• Deterministic NN (D)
EPISTEMIC: ENSEMBLE (E)
Terry Taewoong Um (terry.t.um@gmail.com)
15
• Ensemble : Look at the variance of the predictions
HOW DO WE USE THESE UNCERTAINTIES?
Terry Taewoong Um (terry.t.um@gmail.com)
16
Nagabandi et al. (ICML2018)
• Action selection
Random shooting  CEM
(Samples actions closer to the action
samples that yield high reward)
• Computing the expected trajectory
reward using recursive state prediction
 closed-form is generally intractable
 particle-based state propagation
STATE PROPAGATION METHODS
Terry Taewoong Um (terry.t.um@gmail.com)
17
• Expectation (E) : deterministic approach
• Moment matching (MM)
• Distribution sampling (DS)
• Trajectory sampling (TS)
ALGORITHM SUMMARY
Terry Taewoong Um (terry.t.um@gmail.com)
18
19
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
https://sites.google.com/view/drl-in-a-handful-of-trials/home
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
20
EXPERIMENTS
Terry Taewoong Um (terry.t.um@gmail.com)
21
CONCLUSION
Terry Taewoong Um (terry.t.um@gmail.com)
• Probabilistic NN, Ensemble-based uncertainty estimation, MPC,
and trajectory sampling methods are combined for the proposed
model-based approach
22
• It is more data-efficient than model-free approaches and
achieves a comparable performance
• Probabilistic model takes the most important role for achieving a
good performance in model-based RL
• [Idea] A state propagation that consider the kinematics of the body?

More Related Content

What's hot

About Two Motion Planning Papers
About Two Motion Planning PapersAbout Two Motion Planning Papers
About Two Motion Planning PapersTerry Taewoong Um
 
Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Terry Taewoong Um
 
Introduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowIntroduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowTerry Taewoong Um
 
On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)Terry Taewoong Um
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 

What's hot (6)

About Two Motion Planning Papers
About Two Motion Planning PapersAbout Two Motion Planning Papers
About Two Motion Planning Papers
 
Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)Deep Variational Bayes Filters (2017)
Deep Variational Bayes Filters (2017)
 
Introduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlowIntroduction to Deep Learning with TensorFlow
Introduction to Deep Learning with TensorFlow
 
On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)On Calibration of Modern Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 

More from Terry Taewoong Um

#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전Terry Taewoong Um
 
A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)Terry Taewoong Um
 
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)Terry Taewoong Um
 
Deep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersDeep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersTerry Taewoong Um
 
로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동Terry Taewoong Um
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsTerry Taewoong Um
 

More from Terry Taewoong Um (6)

#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
#44. KAIST에서 "대학 유죄"를 외치다: ART Lab의 도전
 
A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)A brief introduction to OCR (Optical character recognition)
A brief introduction to OCR (Optical character recognition)
 
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
인공지능의 사회정의의 편이 될 수 있을까? (인공지능과 법)
 
Deep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginnersDeep learning (Machine learning) tutorial for beginners
Deep learning (Machine learning) tutorial for beginners
 
로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동로봇과 인공지능, 그리고 미래의 노동
로봇과 인공지능, 그리고 미래의 노동
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot Mechanics
 

Recently uploaded

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018)

  • 1. Terry Taewoong Um (terry.t.um@gmail.com) University of Waterloo Department of Electrical & Computer Engineering Terry Taewoong Um DEEP REINFORCEMENT LEARNING IN A HANDFUL OF TRIALS USING PROBABIL- ISTIC DYNAMICS MODELS 1
  • 2. Terry Taewoong Um (terry.t.um@gmail.com) 2 NIPS 2018
  • 3. REINFORCEMENT LEARNING IS HOT Terry Taewoong Um (terry.t.um@gmail.com) (Pictures from Karpathy’s blog) • Baselines (https://www.cs.ubc.ca/~gberseth/blog/demystifying-the- many-deep-reinforcement-learning-algorithms.html) 3
  • 4. WHAT IS THE PROBLEM? Terry Taewoong Um (terry.t.um@gmail.com) Gu and Holly et al., “Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates”, 2016. • RL requires a lot of data - Rewards in RL give more indirect information than labels in supervised learning • RL is not generalize well in new tasks / environments - Meta learning • RL have been used for robotics before the era of deep RL - RL with Gaussian process 4
  • 5. MODEL-FREE VS. MODEL-BASED Terry Taewoong Um (terry.t.um@gmail.com) 5 Model Performance : Model-free RL > Model-based RL Data efficiency : Model-free RL < Model-based RL (MLSS2017, Jan Peters)
  • 6. GP MODEL VS NN MODEL Terry Taewoong Um (terry.t.um@gmail.com) 6 Learning speed : GP model > NN model For small data : GP model > NN model Capacity : GP model < NN model For large data : GP model < NN model Q) How can we make a NN-model-based RL with less weaknesses? In other words, how can we make a NN-model-based RL which is also good for small data?
  • 7. Terry Taewoong Um (terry.t.um@gmail.com) 7 ICML 2018 https://sites.google.com/view/mbmf
  • 8. NN-MODEL-BASED RL Terry Taewoong Um (terry.t.um@gmail.com) 8 • How can we choose the optimal actions with a learned model ? • What is model predictive control (MPC)?
  • 9. TRAINING Terry Taewoong Um (terry.t.um@gmail.com) 9 • Training the model • Choose the optimal policy
  • 10. NN-MODEL-BASED RL Terry Taewoong Um (terry.t.um@gmail.com) 10 Initialize the model with MBRL and fine-tune with MFRL
  • 11. Terry Taewoong Um (terry.t.um@gmail.com) 11 NIPS 2018 ICML 2018
  • 12. UNCERTAINTY IN DL Terry Taewoong Um (terry.t.um@gmail.com) 12 • Two types of uncertainty : aleatoric (w/ data) & epistemic (w/o data) uncertainty
  • 13. UNCERTAINTY IN DL Terry Taewoong Um (terry.t.um@gmail.com) 13
  • 14. ALEATORIC: PROBABILISTIC NN (P) Terry Taewoong Um (terry.t.um@gmail.com) 14 • Probabilistic NN (P) • Deterministic NN (D)
  • 15. EPISTEMIC: ENSEMBLE (E) Terry Taewoong Um (terry.t.um@gmail.com) 15 • Ensemble : Look at the variance of the predictions
  • 16. HOW DO WE USE THESE UNCERTAINTIES? Terry Taewoong Um (terry.t.um@gmail.com) 16 Nagabandi et al. (ICML2018) • Action selection Random shooting  CEM (Samples actions closer to the action samples that yield high reward) • Computing the expected trajectory reward using recursive state prediction  closed-form is generally intractable  particle-based state propagation
  • 17. STATE PROPAGATION METHODS Terry Taewoong Um (terry.t.um@gmail.com) 17 • Expectation (E) : deterministic approach • Moment matching (MM) • Distribution sampling (DS) • Trajectory sampling (TS)
  • 18. ALGORITHM SUMMARY Terry Taewoong Um (terry.t.um@gmail.com) 18
  • 19. 19 EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) https://sites.google.com/view/drl-in-a-handful-of-trials/home
  • 20. EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) 20
  • 21. EXPERIMENTS Terry Taewoong Um (terry.t.um@gmail.com) 21
  • 22. CONCLUSION Terry Taewoong Um (terry.t.um@gmail.com) • Probabilistic NN, Ensemble-based uncertainty estimation, MPC, and trajectory sampling methods are combined for the proposed model-based approach 22 • It is more data-efficient than model-free approaches and achieves a comparable performance • Probabilistic model takes the most important role for achieving a good performance in model-based RL • [Idea] A state propagation that consider the kinematics of the body?