SlideShare a Scribd company logo
1 of 39
Download to read offline
© 2019 The MathWorks, Inc.
How to Train your Robot
with Deep Reinforcement Learning
Lucas García, PhD
Senior Application Engineer
MathWorks
@mathinking
2
Did you know that
more neurons get activated
in your brain when you walk
than when you play a
game of chess?
© 2019 The MathWorks, Inc.
How would you build an AI
that could walk?
4
Credit: Tom Buehler / MIT CSAIL
5
Credit: Erico Guizzo/IEEE Spectrum
© 2019 The MathWorks, Inc.
Lucas García, PhD
Senior Application Engineer
MathWorks
@mathinking
Thanks to: Aditya Baru, Sebastian Castro, Brian Douglas, John Glass, Carlos Sanchis, Emmanouil Tzorakoleftherakis and others.
7
The goal of control
8
The goal of control
9
A walking robot – the traditional way
Observations
Motor
Commands
Camera
Data
Feature
Extraction
State
Estimation
Control
System
Motor
Commands
Observations
Sensors
Motor
Control
Leg & Trunk
Trajectories
Balance
10
A walking robot – the alternative approach
Observations
Camera
Data
Feature
Extraction
State
Estimation
Control
System
Sensors
Motor
Commands
Motor
Commands
Observations
Camera
Data
Sensors
Black Box
Controller
11
What is Reinforcement Learning?
Reinforcement learning is learning what to do—how to map
situations to actions—so as to maximize a numerical reward signal.
The learner is not told which actions to take, but instead must
discover which actions yield the most reward by trying them.
Sutton and Barto,
Reinforcement Learning: An Introduction
“
”
12
Reinforcement Learning Applications
video games
autonomous vehicles
robotics controls
13
Some Reinforcement Learning Terminology
14
Reinforcement Learning Workflow
15
Reinforcement Learning Workflow
16
Environment
▪ Everything outside of an agent
17
Environment
▪ Everything outside of an agent
𝑋, 𝑌, 𝑍, 𝜓, 𝜃, 𝜙
𝑞𝑅1 … 𝑞𝑅𝑁
𝑞𝐿1 … 𝑞𝐿𝑁
+ derivatives
𝐹𝑅, 𝐹𝐿
𝜏𝑅1 … 𝜏𝑅𝑁
𝜏𝐿1 … 𝜏𝐿𝑁
18
Environment - Simulink
19
Reinforcement Learning Workflow
20
Reward
A function that outputs a scalar number that represents the "goodness" of
an agent being in a particular state and taking a particular action.
21
𝑟𝑡 = − 50 𝑧 − 𝑧0
2
Crafting the Reward
𝑟𝑡 = + 25
𝑇𝑓
𝑇𝑠
𝑟𝑡 = + 𝑣𝑥
𝑟𝑡 = − 3𝑦2
𝑟𝑡 = − 0.02 ෍
𝑖=1
𝑁
𝜏𝑅𝑖
2
+ 𝜏𝐿𝑖
2
22
Crafting the Reward
23
Reinforcement Learning Workflow
24
The Agent
25
The Agent
Policy
function that maps
observations to actions
Reinforcement
Learning Algorithm
optimization method
used to find the
optimal policy
26
The Policy
Tells the agent which
actions to take given
the current state
reward the instantaneous benefit of being in a state and taking a specific action
value the total reward an agent expects to receive from a state and onwards into the future
27
The Policy
It’s not feasible to try every possible action!
28
The Policy – Actor-Critic
Actor chooses an action given the
current state
Critic predicts the value of that state
and action
29
The Policy – Actor-Critic
30
The Policy – Actor-Critic
31
Reinforcement Learning Workflow
32
Training our Deep Reinforcement Learning Agent
Accelerate training by running simulations in parallel
on multicore computers, clusters or the cloud
Train on the GPU when using
Deep Neural Networks for Actor
or Critic representations
33
Training our Deep Reinforcement Learning Agent
34
Reinforcement Learning Workflow
35
Deploy policy to the target hardware
Automatically generate C/C++ or CUDA code
to run the policy on an embedded system
36
Deploy policy to the target hardware
37
Key takeaways
▪ Reinforcement Learning can solve complicated problems
▪ Deep Neural Networks can handle continuous or high-dimensional
state and action spaces
▪ MATLAB and Simulink provide a complete workflow for Deep
Reinforcement Learning
Can’t wait to play with it? Visit our booth!
Code
github.com/mathworks/msra-walking-robot
Download MATLAB
mathworks.com/matlab-bigth19
38
Credit: DLR / MathWorks
Learn more
© 2019 The MathWorks, Inc.
What will Your Next AI look like?
Lucas García, PhD
Senior Application Engineer
MathWorks
@mathinking

More Related Content

What's hot

What's hot (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
Machine Learning: A Fast Review
Machine Learning: A Fast ReviewMachine Learning: A Fast Review
Machine Learning: A Fast Review
 
ChatGPT ppt.pptx
ChatGPT  ppt.pptxChatGPT  ppt.pptx
ChatGPT ppt.pptx
 
10 Ways Your Boss Kills Employee Motivation
10 Ways Your Boss Kills Employee Motivation10 Ways Your Boss Kills Employee Motivation
10 Ways Your Boss Kills Employee Motivation
 
Inspirational Lessons Learned From Martin Luther King Jr.
Inspirational Lessons Learned From Martin Luther King Jr.Inspirational Lessons Learned From Martin Luther King Jr.
Inspirational Lessons Learned From Martin Luther King Jr.
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
Unlocking the Power of ChatGPT
Unlocking the Power of ChatGPTUnlocking the Power of ChatGPT
Unlocking the Power of ChatGPT
 
Staying Cool During Summer
Staying Cool During SummerStaying Cool During Summer
Staying Cool During Summer
 
24 Time Management Hacks to Develop for Increased Productivity
24 Time Management Hacks to Develop for Increased Productivity24 Time Management Hacks to Develop for Increased Productivity
24 Time Management Hacks to Develop for Increased Productivity
 
The updated non-technical introduction to ChatGPT SEDA March 2023.pptx
The updated non-technical introduction to ChatGPT SEDA March 2023.pptxThe updated non-technical introduction to ChatGPT SEDA March 2023.pptx
The updated non-technical introduction to ChatGPT SEDA March 2023.pptx
 
Discover The Top 10 Types Of Colleagues Around You
Discover The Top 10 Types Of Colleagues Around YouDiscover The Top 10 Types Of Colleagues Around You
Discover The Top 10 Types Of Colleagues Around You
 
The power of creative collaboration
The power of creative collaborationThe power of creative collaboration
The power of creative collaboration
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
The Dungeons & Dragons Guide to Marketing
The Dungeons & Dragons Guide to MarketingThe Dungeons & Dragons Guide to Marketing
The Dungeons & Dragons Guide to Marketing
 
ChatGPT for Academic
ChatGPT for AcademicChatGPT for Academic
ChatGPT for Academic
 
Top 10 Tips for Getting a Good Night's Sleep
Top 10 Tips for Getting a Good Night's SleepTop 10 Tips for Getting a Good Night's Sleep
Top 10 Tips for Getting a Good Night's Sleep
 
The Science of Story: How Brands Can Use Storytelling To Get More Customers
The Science of Story: How Brands Can Use Storytelling To Get More CustomersThe Science of Story: How Brands Can Use Storytelling To Get More Customers
The Science of Story: How Brands Can Use Storytelling To Get More Customers
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
ChatGPT_Cheatsheet_Costa.pdf
ChatGPT_Cheatsheet_Costa.pdfChatGPT_Cheatsheet_Costa.pdf
ChatGPT_Cheatsheet_Costa.pdf
 

Similar to How to train your robot (with Deep Reinforcement Learning)

Siddha Ganju, NVIDIA. Deep Learning for Mobile
Siddha Ganju, NVIDIA. Deep Learning for MobileSiddha Ganju, NVIDIA. Deep Learning for Mobile
Siddha Ganju, NVIDIA. Deep Learning for Mobile
IT Arena
 

Similar to How to train your robot (with Deep Reinforcement Learning) (20)

Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack
 
900 keynote abbott
900 keynote abbott900 keynote abbott
900 keynote abbott
 
IRJET- Virtual Fitness Trainer with Spontaneous Feedback using a Line of Moti...
IRJET- Virtual Fitness Trainer with Spontaneous Feedback using a Line of Moti...IRJET- Virtual Fitness Trainer with Spontaneous Feedback using a Line of Moti...
IRJET- Virtual Fitness Trainer with Spontaneous Feedback using a Line of Moti...
 
Virtual Yoga System Using Kinect Sensor
Virtual Yoga System Using Kinect SensorVirtual Yoga System Using Kinect Sensor
Virtual Yoga System Using Kinect Sensor
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
Machine Learning Presentation
Machine Learning PresentationMachine Learning Presentation
Machine Learning Presentation
 
GAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNINGGAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNING
 
Decision Review System
Decision Review SystemDecision Review System
Decision Review System
 
Human pose detection using machine learning by Grandel
Human pose detection using machine learning by GrandelHuman pose detection using machine learning by Grandel
Human pose detection using machine learning by Grandel
 
IRJET - Human Pose Detection using Deep Learning
IRJET - Human Pose Detection using Deep LearningIRJET - Human Pose Detection using Deep Learning
IRJET - Human Pose Detection using Deep Learning
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
 
IRJET - Face Recognition Door Lock using IoT
IRJET - Face Recognition Door Lock using IoTIRJET - Face Recognition Door Lock using IoT
IRJET - Face Recognition Door Lock using IoT
 
Siddha Ganju. Deep learning on mobile
Siddha Ganju. Deep learning on mobileSiddha Ganju. Deep learning on mobile
Siddha Ganju. Deep learning on mobile
 
Siddha Ganju, NVIDIA. Deep Learning for Mobile
Siddha Ganju, NVIDIA. Deep Learning for MobileSiddha Ganju, NVIDIA. Deep Learning for Mobile
Siddha Ganju, NVIDIA. Deep Learning for Mobile
 
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
 
Aprendizaje reforzado con swift
Aprendizaje reforzado con swiftAprendizaje reforzado con swift
Aprendizaje reforzado con swift
 
Machine Learning in Cybersecurity.pdf
Machine Learning in Cybersecurity.pdfMachine Learning in Cybersecurity.pdf
Machine Learning in Cybersecurity.pdf
 
Person Acquisition and Identification Tool
Person Acquisition and Identification ToolPerson Acquisition and Identification Tool
Person Acquisition and Identification Tool
 

Recently uploaded

Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
IJECEIAES
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
Kira Dess
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 

Recently uploaded (20)

Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 

How to train your robot (with Deep Reinforcement Learning)