Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Model-based Reinforcement Learning
with Neural Networks
on Hierarchical Dynamic System
Akihiko Yamaguchi and Christopher G...
http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
e...
https://youtu.be/GjwfbOur3CQ
Pouring: A Manipulation of Deformable Object
Planning actions
Planning parameters of actions
= Dynamic Programming (Opt ct...
Remarks of Reinforcement Learning
Good to think about Model-free RL v.s.
Model-based RL
Successful robot-learning RL is mo...
How to deal with simulation biases?
Do not learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
Para...
Stochastic Neural Networks
Propagation of probability distribution from input to output
Gradients of output expectation w....
Use Case
8 Independent neural networks for each (sub)dynamical system
Stochastic Differential Dynamic Programming
9
Results of Experiments
DNN+DDP was better
than LWR+DDP
Using redundant
features did not affect
the learning
performance
Wo...
More Information
http://akihikoy.net/
https://www.youtube.com/AkihikoYamaguchi
Akihiko Yamaguchi and Christopher G. Atkeso...
Upcoming SlideShare
Loading in …5
×

Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

558 views

Published on

These slides are presented in the workshop on Deep Reinforcement Learning: Frontiers and Challenges in the 25th International Joint Conference on Artificial Intelligence (IJCAI2016), New York, 2016.
Paper: https://www.researchgate.net/publication/305115348

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

  1. 1. Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System Akihiko Yamaguchi and Christopher G. Atkeson Robotics Institute, Carnegie Mellon University http://akihikoy.net/
  2. 2. http://reflectionsintheword.files.wordpress.com/ 2012/08/pouring-water-into-glass.jpg http://schools.graniteschools.org/ edtech-canderson/files/2013/01/ heinz-ketchup-old-bottle.jpg http://old.post-gazette.com/images2/ 20021213hosqueeze_230.jpg http://img.diytrade.com/cdimg/1352823/17809917/ 0/1292834033/shampoo_bottle_bodywash_bottle.jpg http://www.nescafe.com/ upload/golden_roast_f_711.png My pizza demonstration https://youtu.be/Wgj32blPGiE
  3. 3. https://youtu.be/GjwfbOur3CQ
  4. 4. Pouring: A Manipulation of Deformable Object Planning actions Planning parameters of actions = Dynamic Programming (Opt ctrl, MPC, …) Dynamics are partially unknown  Reinforcement Learning Problem RL in pouring Adaptation: not much hard Generalization: hard Is Deep NN useful in this problem? (How to use in RL framework?)4
  5. 5. Remarks of Reinforcement Learning Good to think about Model-free RL v.s. Model-based RL Successful robot-learning RL is model-free (direct policy search) [cf. Kober et al. 2013] Good at fine-tuning, Less computation cost (at execution) Robust to PoMDP Model-based: Simulation biases Model-based: 1. Generalization ability 2. Sharable / Reusable 3. Capable to reward changes 2 and 3: Thanks to symbolic (hierarchical) representation 5 input output hidden - u update FK ANN [Magtanong et al. 2012]
  6. 6. How to deal with simulation biases? Do not learn dx/dt = F(x,u) (dt: small like xx ms) Learn (sub)task-level dynamics Parameters  F_grasp  Grasp result Parameters  F_flow_ctrl  Flow ctrl result Use stochastic models Gaussian  F  Gaussian Stochastic Neural Networks [Yamaguchi, Atkeson, ICRA 2016] Use stochastic dynamic programming Stochastic Differential Dynamic Programming [Yamaguchi, Atkeson, Humanoids 2015] 6 Model-based RL with Neural Networks for Hierarchical Dynamic System
  7. 7. Stochastic Neural Networks Propagation of probability distribution from input to output Gradients of output expectation w.r.t. an input Difficulty: Nonlinear activation functions ReLU (f(x)=max(0,x)) 7 Mean model Error model Input (shared)
  8. 8. Use Case 8 Independent neural networks for each (sub)dynamical system
  9. 9. Stochastic Differential Dynamic Programming 9
  10. 10. Results of Experiments DNN+DDP was better than LWR+DDP Using redundant features did not affect the learning performance Worked in pouring with PR2 robot 10 Video: https://youtu.be/aM3hE1J5W98
  11. 11. More Information http://akihikoy.net/ https://www.youtube.com/AkihikoYamaguchi Akihiko Yamaguchi and Christopher G. Atkeson: Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems, in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA2016), Stockholm, Sweden, May, 2016. https://www.researchgate.net/publication/294729454 Akihiko Yamaguchi and Christopher G. Atkeson: Differential Dynamic Programming with Temporally Decomposed Dynamics, in Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids2015), pp. 696-703, Seoul, 2015. https://www.researchgate.net/publication/282157952 Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara: Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015. https://www.researchgate.net/publication/280733055 11

×