Your SlideShare is downloading. ×
Learning the skill of archery by a humanoid robot iCub
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Learning the skill of archery by a humanoid robot iCub

478
views

Published on

Humanoid robot iCub learns the skill of archery. After being instructed how to hold the bow and release the arrow, the robot learns by itself to aim and shoot arrows at the target. It learns to hit …

Humanoid robot iCub learns the skill of archery. After being instructed how to hold the bow and release the arrow, the robot learns by itself to aim and shoot arrows at the target. It learns to hit the center of the target in only 8 trials.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
478
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The problem of detecting where the target is, and what isthe relative position of the arrow with respect to the centerof the target, is solved by image processing. We use colorbaseddetection of the target and the tip of the arrow basedon Gaussian Mixture Model (GMM). The color detection isdone in YUV color space, where Y is the luminance, andUV is the chrominance. Only U and V components are usedto ensure robustness to changes in luminosity.In a calibration phase, prior to conducting an archeryexperiment, the user explicitly defines on a camera imagethe position and size of the target and the position of thearrow’s tip. Then, the user manually selects NT pixels lyinginside the target in the image, and NA pixels from the arrow’stip in the image. The selected points produce two datasets:cT 2 R2NT and cA 2 R2NA respectively.From the two datasets cT and cA, a Gaussian MixtureModel (GMM) is used to learn a compact model of the colorcharacteristics in UV space of the relevant objects. EachGMM is described by the set of parameters fk; k;kgKk=1,representing respectively the prior probabilities, centers andcovariance matrices of the model (full covariances are consideredhere). The prior probabilities k satisfy k 2 R[0;1]andPKk=1 k = 1. A Bayesian Information Criterion (BIC)[13] is used to select the appropriate number of GaussiansKT and KA to represent effectively the features to track.After each reproduction attempt, a camera snapshot istaken to re-estimate the position of the arrow and the target.2From the image cI 2 R2NxNy of NxNy pixels in UVcolor space, the center m of each object on the image isestimated through the weighted sum
  • Transcript

    • 1. Learning the skill of archery by a humanoid robot iCub
      Petar Kormushev, Sylvain Calinon, Ryo Saegusa, Giorgio Metta
      Italian Institute of Technology (IIT)Advanced Robotics dept., RBCS dept. http://www.iit.it
      Humanoids 2010 Nashville, TN, USADecember 6-8, 2010
    • 2. Motivation
      How a robot can learn complex motor skills?
      Why archery task?
      bi-manual coordination
      integration of image processing, motor control and learning parts in one coherent task
      using tools (bow and arrow) to affect an external object (target)
      appropriate task for testing different learning algorithms, because the reward is inherently defined by the goal of the task
      Petar Kormushev, Italian Institute of Technology (IIT)
      2/20
    • 3. The archery task
      Different societies
      Different embodiments
      Zashikikarakuri, 18-19th century(Mechanical automatons)
      Kyudo(Japanese archery)
      Petar Kormushev, Italian Institute of Technology (IIT)
      Differences in the
      learned skill
      3/20
    • 4. iCub archery skill
      iCub is an open-source humanoid robot with dimensions comparable to 3.5 year-old child, 104 cm tall, with 53 DOF.
      Static grasp of the bow
      Aiming skill
      Petar Kormushev, Italian Institute of Technology (IIT)
      4/20
    • 5. Problem definition
      How to learn to shoot the arrowso that it hits the center of the target:
      • aim at the target
      • 6. recognize arrow’s position wrt. the target
      Assumptions:
      • Prior knowledge about how to hold the bow and release the arrow
      • 7. Prior knowledge about the colors of the target and the arrow
      Petar Kormushev, Italian Institute of Technology (IIT)
      5/20
    • 8. Proposed approach
      For learning bi-manual aiming:
      PoWER: EM-based Reinforcement Learning
      ARCHER: Chained vector regression algorithm
      For hands position/orientation control:
      IK motion controller for the two arms
      For image recognition of the target and arrow:
      color-based detection based on GMM
      Petar Kormushev, Italian Institute of Technology (IIT)
      6/20
    • 9. Learning algorithm #1: PoWER
      Policy learning by Weighting Exploration with the Returns (PoWER)
      Reasons to select PoWER:
      state-of-the-art EM-based RL algorithm
      no need of learning rate (unlike policy-gradient methods)
      efficient use of past experience via importance sampling
      single rollout enough to update policy
      Jens Kober and Jan Peters, NIPS 2009
      Petar Kormushev, Italian Institute of Technology (IIT)
      7/20
    • 10. PoWER - implementation
      Policy parameters :
      relative position the two hands(3D vector from right to left hand)
      Policy update rule:
      Importance sampling
      uses best σ rollouts so far
      relative exploration
      Petar Kormushev, Italian Institute of Technology (IIT)
      8/20
    • 11. PoWER - reward function
      Return of an arrow shooting rollout :
      Estimated target center position
      Estimated arrow tip position
      Petar Kormushev, Italian Institute of Technology (IIT)
      9/20
    • 12. Learning algorithm #2: ARCHER
      Augmented Reward CHainEd Regression
      Multi-dimensional reward vector
      Iteratively converging process
      Using regression to estimate new parameters
      ARCHER can be viewed as a linear vector regression with a shrinking support region.
      Petar Kormushev, Italian Institute of Technology (IIT)
      10/20
    • 13. Learning algorithm #2: ARCHER
      rollouts
      input parameters
      observed result
      target reward
      matrix form
      least-norm approximation
      of the weights:
      Petar Kormushev, Italian Institute of Technology (IIT)
      11/20
    • 14. Learning algorithm #2: ARCHER
      ARCHER is suitable for problemsfor which:
      a-priori knowledge about the desired goal reward is known
      the reward can be decomposed into separate components
      the task has a smooth solution space
      Makes use of multi-dimensional reward, unlike standard RL, which only uses scalar reward
      Petar Kormushev, Italian Institute of Technology (IIT)
      12/20
    • 15. Simulation experiment
      Convergence criteria: distance to the center < 5 cm
      PoWER
      ARCHER
      19 rollouts to converge
      5 rollouts to converge
      Petar Kormushev, Italian Institute of Technology (IIT)
      13/20
    • 16. Speed of convergence
      Averaged over 40 runs with 60 rollouts in each run:
      ARCHER converges faster than PoWER due to:
      • Using 2D reward to
      estimate parameters
      • Using prior knowledge
      about the goal’s reward
      PoWER achieves reasonable performance despite using only 1D
      feedback information.
      First 3 rollouts with high random exploration
      Petar Kormushev, Italian Institute of Technology (IIT)
      14/20
    • 17. Image recognition
      • Automatic detection of target and arrow
      • 18. YUV color space (Y - luminance, UV – chrominance)
      • 19. GMM for color-based detection
      Estimated reward vector:
      Petar Kormushev, Italian Institute of Technology (IIT)
      15/20
    • 20. Robot motion controller
      Pattacini et al, IROS 2010
      Minimum-jerk IK cartesian controller
      Hands orientation control
      Posture and grasping configuration
      Petar Kormushev, Italian Institute of Technology (IIT)
      16/20
    • 21. Real-world experiment
      Petar Kormushev, Italian Institute of Technology (IIT)
      17/20
    • 22. Real-world performance
      Distance between robot and target: 220 cm; Height of the robot: 104 cm
      Diameter of target: 50 cm
      Convergence is slightly slower than simulation because of:
      • Noise of image position
      estimation
      • Position/orientation
      error of controller
      • Nonlinearities of task
      ARCHER converges in less than 10 rollouts
      Petar Kormushev, Italian Institute of Technology (IIT)
      18/20
    • 23. Conclusion
      • iCub learned to aim and hit the center of the target
      • 24. Two learning algorithms were used to coordinate the posture of the hands:
      • 25. PoWER: EM-based reinforcement learning
      • 26. ARCHER: local vector regression with shrinking support region
      • 27. Reward was extracted autonomously from visual feedback via colored-basedimage processing using GMM
      • 28. ARCHER converges faster than PoWER due to:
      • 29. multi-dimensional reward
      • 30. known target reward
      • 31. regression-based parameter estimation
      • 32. Future work: use imitation learning to teach the robot the whole movement for grasping and pulling the arrow
      Petar Kormushev, Italian Institute of Technology (IIT)
      19/20
    • 33. Thank you for your kind attention!
      Petar Kormushev, Italian Institute of Technology (IIT)
      More information:
      http://kormushev.com/
      20/20