Robot Motor Skill Coordination with EM-based Reinforcement Learning

835 views

Published on

A Barrett WAM robot learns to flip pancakes by reinforcement learning.

The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction.

The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
835
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Some tasks can be learned very efficiently through this kinesthetic teaching mechanism. However, some other tasks are more difficult to learn in such a way. By trying to generalize the skill based on several observation, it might be difficult to extract the important information from motion observation. For task that are highly dynamic such as pancake flipping, the controller learned by imitation does not generalize correctly the task of flipping the pancake.The skill can however be refined through reinforcement learning. After about 50 trials, the robot learns that the first part of the task requires a stiff behaviorto throw the pancake in the air, while the second part of the task requires the hand to be more compliant to catch the pancake without letting it bounce off the pan.
  • Use of a gravity compensation controller as a user-friendly means of transferring a skill through kinesthetic teaching, and for physical human-robot interaction tasks where safety issues need to be considered.
  • Robot Motor Skill Coordination with EM-based Reinforcement Learning

    1. 1. Robot Motor Skill Coordination withEM-based Reinforcement Learning<br />Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell<br />Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it<br />October 20, 2010IROS 2010<br />
    2. 2. Motivation<br />How to learn complex motor skills which also require variable stiffness?<br />How to demonstrate the required stiffness/compliance?<br />How to teach highly-dynamic tasks?<br />Petar Kormushev, Italian Institute of Technology<br />2/22<br />
    3. 3. Background<br />Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations<br />Petar Kormushev, Italian Institute of Technology<br />Sylvain Calinon et al., IROS 2010<br />3/22<br />
    4. 4. Robot Motor Skill Learning<br />Motion capture Kinesthetic teaching<br />Imitation learning<br />Reinforcement learning<br />Shared representation(encoding)<br />Petar Kormushev, Italian Institute of Technology<br />4/22<br />
    5. 5. Skill representation (encoding)<br /><ul><li>Time dependent
    6. 6. Time independent</li></ul>Trajectory-based<br />Via-points<br />DMP<br />GMM/GMR<br />DS-based<br />Petar Kormushev, Italian Institute of Technology<br />5/22<br />
    7. 7. Dynamic Movement Primitives<br />DMP<br />Ijspeert, Nakanishi, Schaal, IROS 2001<br />Demonstrated trajectory<br />Sequence of attractors<br />Petar Kormushev, Italian Institute of Technology<br />6/22<br />
    8. 8. Extended DMP to include coordination<br />Stiffness gain (scalar)<br />Coordination matrix (full stiffness matrix)<br />Advantages:<br /><ul><li> capture correlations between the different motion variables
    9. 9. reduce number of primitives</li></ul>Petar Kormushev, Italian Institute of Technology<br />Proposal: use Reinforcement learning to learn the coordination matrices<br />7/22<br />
    10. 10. Example: Reaching task with obstacle<br />Using full coordination matrices<br />Using diagonal matrices<br />Expected return: 0.61<br />Expected return: 0.73<br />Reward function:<br />Petar Kormushev, Italian Institute of Technology<br />8/22<br />
    11. 11. EM-based Reinforcement learning (RL)<br />PoWER algorithm - Policy learning by Weighting Exploration with the Returns<br />Advantages over policy-gradient based RL:<br />no need of learning rate<br />can use importance sampling<br />single rollout enough to update policy<br />Jens Kober and Jan Peters, NIPS 2009<br />Petar Kormushev, Italian Institute of Technology<br />9/22<br />
    12. 12. RL implementation<br />Policy parameters<br />Full coordination matrices:<br />Attractor vectors:<br />Policy update rule:<br />Importance sampling<br />uses best σ rollouts so far<br />Petar Kormushev, Italian Institute of Technology<br />10/22<br />
    13. 13. Pancake flipping: Experimental setup<br />Petar Kormushev, Italian Institute of Technology<br />Barrett WAM 7-DOF robot<br />Artificial pancakewith 4 passive markers<br />(more robust to occlusions)<br />Frying pan mounted on the end-effector<br />11/22<br />
    14. 14. Evaluation: Tracking of the pancake<br />NaturalPointOptiTrack motion capture system<br />Petar Kormushev, Italian Institute of Technology<br />x12<br />100 Hz camera fps <br />40 Hz real-time capturing<br />12/22<br />
    15. 15. Cumulative return of a rollout:<br />Reward function<br />Petar Kormushev, Italian Institute of Technology<br /><ul><li>Reward function:</li></ul>orientation<br />position<br />height<br />13/22<br />
    16. 16. Kinesthetic demonstration of the task<br />Petar Kormushev, Italian Institute of Technology<br />14/22<br />
    17. 17. Learning by trial and error<br />Petar Kormushev, Italian Institute of Technology<br />15/22<br />
    18. 18. Finally learned skill<br />Petar Kormushev, Italian Institute of Technology<br />16/22<br />
    19. 19. Motion capture to evaluate rollouts<br />Petar Kormushev, Italian Institute of Technology<br />17/22<br />
    20. 20. Captured pancake trajectory<br />Petar Kormushev, Italian Institute of Technology<br />90° flip<br />180° flip<br />18/22<br />
    21. 21. Performance<br />Petar Kormushev, Italian Institute of Technology<br />19/22<br />
    22. 22. Gravity compensation<br />Task execution<br />Reproduction control strategy<br />Petar Kormushev, Italian Institute of Technology<br />20/22<br />
    23. 23. Conclusion<br />Combining Imitation learning + RL to learn motor skills with variable stiffness<br />Imitation used to initialize policy<br />RL to learn coordination matrices<br />Learned variable stiffness duringreproduction<br />Future work<br />other representations<br />other RL algorithms<br />Petar Kormushev, Italian Institute of Technology<br />21/22<br />
    24. 24. Thanks for your attention!<br />Petar Kormushev, Italian Institute of Technology<br />22/22<br />

    ×