Advertisement
Advertisement

More Related Content

Advertisement

How to train your robot (with Deep Reinforcement Learning)

  1. © 2019 The MathWorks, Inc. How to Train your Robot with Deep Reinforcement Learning Lucas García, PhD Senior Application Engineer MathWorks @mathinking
  2. 2 Did you know that more neurons get activated in your brain when you walk than when you play a game of chess?
  3. © 2019 The MathWorks, Inc. How would you build an AI that could walk?
  4. 4 Credit: Tom Buehler / MIT CSAIL
  5. 5 Credit: Erico Guizzo/IEEE Spectrum
  6. © 2019 The MathWorks, Inc. Lucas García, PhD Senior Application Engineer MathWorks @mathinking Thanks to: Aditya Baru, Sebastian Castro, Brian Douglas, John Glass, Carlos Sanchis, Emmanouil Tzorakoleftherakis and others.
  7. 7 The goal of control
  8. 8 The goal of control
  9. 9 A walking robot – the traditional way Observations Motor Commands Camera Data Feature Extraction State Estimation Control System Motor Commands Observations Sensors Motor Control Leg & Trunk Trajectories Balance
  10. 10 A walking robot – the alternative approach Observations Camera Data Feature Extraction State Estimation Control System Sensors Motor Commands Motor Commands Observations Camera Data Sensors Black Box Controller
  11. 11 What is Reinforcement Learning? Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. Sutton and Barto, Reinforcement Learning: An Introduction “ ”
  12. 12 Reinforcement Learning Applications video games autonomous vehicles robotics controls
  13. 13 Some Reinforcement Learning Terminology
  14. 14 Reinforcement Learning Workflow
  15. 15 Reinforcement Learning Workflow
  16. 16 Environment ▪ Everything outside of an agent
  17. 17 Environment ▪ Everything outside of an agent 𝑋, 𝑌, 𝑍, 𝜓, 𝜃, 𝜙 𝑞𝑅1 … 𝑞𝑅𝑁 𝑞𝐿1 … 𝑞𝐿𝑁 + derivatives 𝐹𝑅, 𝐹𝐿 𝜏𝑅1 … 𝜏𝑅𝑁 𝜏𝐿1 … 𝜏𝐿𝑁
  18. 18 Environment - Simulink
  19. 19 Reinforcement Learning Workflow
  20. 20 Reward A function that outputs a scalar number that represents the "goodness" of an agent being in a particular state and taking a particular action.
  21. 21 𝑟𝑡 = − 50 𝑧 − 𝑧0 2 Crafting the Reward 𝑟𝑡 = + 25 𝑇𝑓 𝑇𝑠 𝑟𝑡 = + 𝑣𝑥 𝑟𝑡 = − 3𝑦2 𝑟𝑡 = − 0.02 ෍ 𝑖=1 𝑁 𝜏𝑅𝑖 2 + 𝜏𝐿𝑖 2
  22. 22 Crafting the Reward
  23. 23 Reinforcement Learning Workflow
  24. 24 The Agent
  25. 25 The Agent Policy function that maps observations to actions Reinforcement Learning Algorithm optimization method used to find the optimal policy
  26. 26 The Policy Tells the agent which actions to take given the current state reward the instantaneous benefit of being in a state and taking a specific action value the total reward an agent expects to receive from a state and onwards into the future
  27. 27 The Policy It’s not feasible to try every possible action!
  28. 28 The Policy – Actor-Critic Actor chooses an action given the current state Critic predicts the value of that state and action
  29. 29 The Policy – Actor-Critic
  30. 30 The Policy – Actor-Critic
  31. 31 Reinforcement Learning Workflow
  32. 32 Training our Deep Reinforcement Learning Agent Accelerate training by running simulations in parallel on multicore computers, clusters or the cloud Train on the GPU when using Deep Neural Networks for Actor or Critic representations
  33. 33 Training our Deep Reinforcement Learning Agent
  34. 34 Reinforcement Learning Workflow
  35. 35 Deploy policy to the target hardware Automatically generate C/C++ or CUDA code to run the policy on an embedded system
  36. 36 Deploy policy to the target hardware
  37. 37 Key takeaways ▪ Reinforcement Learning can solve complicated problems ▪ Deep Neural Networks can handle continuous or high-dimensional state and action spaces ▪ MATLAB and Simulink provide a complete workflow for Deep Reinforcement Learning Can’t wait to play with it? Visit our booth! Code github.com/mathworks/msra-walking-robot Download MATLAB mathworks.com/matlab-bigth19
  38. 38 Credit: DLR / MathWorks Learn more
  39. © 2019 The MathWorks, Inc. What will Your Next AI look like? Lucas García, PhD Senior Application Engineer MathWorks @mathinking
Advertisement