Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bayesian Nonparametric Motor-skill Representations
for Efficient Learning of Robotic Clothing Assistance
Workshop on Practic...
Robotic Clothing Assistance
Aging causes loss of motor functions to perform dextrous tasks.
Goal: Develop learning framewo...
Reinforcement Learning for Clothing Assistance
Markov Decision Process (MDP)
formulated with low-dimensional state,
policy...
Clothing Assistance Framework 1
: Outline
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS...
Clothing Assistance Framework 1
: Policy
Control policy parametrized by Via-points 2
of trajectory.
Finite difference polic...
Problem: Adaptive Learning of Clothing Skills
Design of robust motor-skills learning framework is crucial for
real-world i...
Reinforcement Learning in Latent Space
Combining motor-skills learning with dimensionality reduction:
Tractable search spa...
Motor-skill Learning in Latent Spaces
Use Bayesian nonparametric nonlinear dimensionality reduction for
efficient learning o...
Bayesian Gaussian Process Latent Variable Model
Latent variable model (Titsias et al., 2010 1):
y = f (x) + , ∈ N(0, σ2
I)...
BGPLVM: Manifold Learning
Bayesian Inference: Posterior distribution on the latent
space.
p(Y) =
X
p(Y|X)p(X)dX
Marginaliz...
Motor-skills Transfer through Latent Space
BGPLVM model trained on robot joint angles ∈ R14
for kinesthetic
demonstration ...
Reinforcement Learning in BGPLVM Space
Apply Cross Entropy Method to perform policy improvement:
θ∗
∼ N(θ|µ∗
, Σ∗
)
µ∗
:= ...
Reinforcement Learning in BGPLVM Space
Represent reward function by distance from desired Via-points
of current policy:
R(...
Latent Space Controller for Clothing Tasks 1
1
Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clo...
Generalization in Latent Space
Evaluation: Reconstruction error
of latent space with RMS Error 1.
Dataset: Clothing trajec...
Reinforcement Learning in Latent Space
Apply Reinforcement Learning in different action spaces with same
formulation and re...
Moving forward
Immediate Goal: Latent spaces for Robotics applications:
Auto-regressive prior on latent space to capture t...
Appendix
15 / 15
Topology Coordinates
To approximate Markov Decision Process, the relationship between
cloth and subject needs to be observ...
Topology Space
The relationship between linesegments is defined by the Writhe
matrix(Tn×m).
Given line segments S1, S2 with...
Clothing Assistance Framework 1
: State and Reward
Low-dimensional representation using Topology Coordinates 2
.
Reward gi...
Combining DR and RL
Policy representation:
a = W(ZT
Φ) + MΦ + EΦ
Expectation Step: Posterior distribution over Latent Vari...
DR as Preprocessing for RL
Bitzer et al. (2010) 1: GPLVM based latent space encoding
task space constraints.
Non-linear di...
Combining DR and RL
Luck et al. (2014) 1: Joint learning of latent space and
optimal policy.
a = W(ZT
Φ) + MΦ + EΦ (2)
PeP...
Combining DR and RL
Inverse Kinematics: Planning in joint angle space of highly
redundant robot (20 DOF).
Standing on one ...
Discussion
Robotic Clothing Assistance involves several problems.
Propose use of DR with RL for efficient motor-skills learn...
References
Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a
dual-arm robot.” Humanoid Robots (H...
Upcoming SlideShare
Loading in …5
×

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

1,589 views

Published on

Presentation made at Recruit Seminar on NIPS 2016. The proceedings for this presentation is available at this link: https://www.researchgate.net/publication/312157083_Bayesian_Nonparametric_Motor-skill_Representations_for_Efficient_Learning_of_Robotic_Clothing_Assistance

Published in: Engineering
  • Be the first to comment

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

  1. 1. Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance Workshop on Practical Bayesian Nonparametrics, NIPS 2016 Nishanth Koganti1,2 , Tomoya Tamei1 , Kazushi Ikeda1 , Tomohiro Shibata2 1 Nara Institute of Science and Technology, Ikoma, Japan 2 Kyushu Institute of Technology, Kitakyushu, Japan February 11, 2017 0 / 15
  2. 2. Robotic Clothing Assistance Aging causes loss of motor functions to perform dextrous tasks. Goal: Develop learning framework for humanoid robots to perform clothing assistance. Challenge: Close interaction of robot with clothes and human Non-rigid clothing material 1 Varying posture of human 1 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 1 / 15
  3. 3. Reinforcement Learning for Clothing Assistance Markov Decision Process (MDP) formulated with low-dimensional state, policy representations. 1 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
  4. 4. Clothing Assistance Framework 1 : Outline 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
  5. 5. Clothing Assistance Framework 1 : Policy Control policy parametrized by Via-points 2 of trajectory. Finite difference policy gradient method is used for policy update: ∂η(θ) ∂θ ≈ r(θi + ∆θ) − r(θi − ∆θ) 2∆θ θ ← θ + α ∂η(θ) ∂θ 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Wada, Y. et al. “Theory for handwriting on minimization principle.” in Biological Cybernetics, 1995 3 / 15
  6. 6. Problem: Adaptive Learning of Clothing Skills Design of robust motor-skills learning framework is crucial for real-world implementation on low-cost robots. Tight coupling with cloth and close proximity to Human. Optimal policy varies with initial conditions. Non-rigid clothing material Varying posture of human 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 4 / 15
  7. 7. Reinforcement Learning in Latent Space Combining motor-skills learning with dimensionality reduction: Tractable search space reducing learning time. Latent space can be modeled to capture task space constraints. Existing methods rely on linear models or MAP estimate of latent space. Bitzer et al., 2010 1 Luck et al., 2014 2 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 2 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 5 / 15
  8. 8. Motor-skill Learning in Latent Spaces Use Bayesian nonparametric nonlinear dimensionality reduction for efficient learning of clothing skills 1. 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 6 / 15
  9. 9. Bayesian Gaussian Process Latent Variable Model Latent variable model (Titsias et al., 2010 1): y = f (x) + , ∈ N(0, σ2 I) y ∈ RD : Observed Variable x ∈ RQ (Q D): Unknown latent variable f : x → y: Mapping given by Gaussian Process p(Y|X) = D d=1 N(yd |0, KNN + β−1 IN) x f w, θ y 1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 7 / 15
  10. 10. BGPLVM: Manifold Learning Bayesian Inference: Posterior distribution on the latent space. p(Y) = X p(Y|X)p(X)dX Marginalization made tractable using variational inference: q(X) = N n=1 N(xn|µn, Sn) log(p(Y)) ≥ q(X)p(Y|X)dX − q(X) log q(X) p(X) dX Automatic dimensionality reduction possible using ARD kernel: k(x, x ) = σ2 f exp  − 1 2 Q q=1 wq(xq − xq)2   1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 8 / 15
  11. 11. Motor-skills Transfer through Latent Space BGPLVM model trained on robot joint angles ∈ R14 for kinesthetic demonstration of clothing assistance 1. 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 9 / 15
  12. 12. Reinforcement Learning in BGPLVM Space Apply Cross Entropy Method to perform policy improvement: θ∗ ∼ N(θ|µ∗ , Σ∗ ) µ∗ := mean(argmax θold), Σ∗ := var(argmax θold) Represent policy using Dynamic Movement Primitive (DMP): τ¨x = K(g − x) − D ˙x + (g − x0)f f (s) = i wi ψi (s)s i ψi (s) , where τ ˙s = −αs 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 10 / 15
  13. 13. Reinforcement Learning in BGPLVM Space Represent reward function by distance from desired Via-points of current policy: R(π(θ)) = ndims i=1 nvia j=1 Vi,j − πi (θ, ti,j) 2 11 / 15
  14. 14. Latent Space Controller for Clothing Tasks 1 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 12 / 15
  15. 15. Generalization in Latent Space Evaluation: Reconstruction error of latent space with RMS Error 1. Dataset: Clothing trajectories for 4 postures: Shoulder Angle ∈ {65o , 70o , 75o , 80o }. PCA GPLVM BGPLVM 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 13 / 15
  16. 16. Reinforcement Learning in Latent Space Apply Reinforcement Learning in different action spaces with same formulation and reward function Parameters: 50 × ndims basis functions CEM: 50 rollouts per iteration. Policy Update: 5 best rollouts per iteration 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 14 / 15
  17. 17. Moving forward Immediate Goal: Latent spaces for Robotics applications: Auto-regressive prior on latent space to capture task dynamics. Explicit model of human-robot interaction as constraint. Ambitious Goal: Combine policy search RL and BGPLVM: Non-linear dimensionality reduction. Bayesian and data-efficient learning. Data-efficient 1 Bayesian Inference 1 1 Deisenroth, M. P. et al., “Gaussian processes for data-efficient learning in robotics and control” in IEEE Transactions PAMI, 2015 15 / 15
  18. 18. Appendix 15 / 15
  19. 19. Topology Coordinates To approximate Markov Decision Process, the relationship between cloth and subject needs to be observed as much as possible. Low dimensional representations need to be used for a fast learning time. Topological Coordinates introduced to address both requirements. Concept proposed by Edmond et. al(2009) 1 . Given 2 line segments, the amount of twist(writhe) between them is given by the Guassian Linking Integral(GLI): w = GLI(γ1, γ2) = 1 4π γ1 γ2 dγ1 × dγ2 · (γ1 − γ2) γ1 − γ2 3 (1) 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
  20. 20. Topology Space The relationship between linesegments is defined by the Writhe matrix(Tn×m). Given line segments S1, S2 with n,m links, Tn×m is given by: Tij = GLI(Si 1, Sj 2) The parameters writhe, center, density are defined from writhe matrix which form the Topology Space. 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
  21. 21. Clothing Assistance Framework 1 : State and Reward Low-dimensional representation using Topology Coordinates 2 . Reward given by distance between final state and target state: ri = − starget i − si (i = 1, 2, 3), r(s) = 3 i=1 ri − µi σi 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Ho, E. S., et al., “Character synthesis by topology coordinates”, in Computer Graphics Forum 2009 15 / 15
  22. 22. Combining DR and RL Policy representation: a = W(ZT Φ) + MΦ + EΦ Expectation Step: Posterior distribution over Latent Variables pθold (ZT Φ|a) = N(CWT (a − MΦ), Cσ2 tr(ΦΦT )), C = (σ2 I + WT W) Maximization: Compute gradients with respect to Policy parameters ∂lnp(a)Qt π ∂M , ∂lnp(a)Qt π ∂W , ∂lnp(a)Qt π ∂σ2 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  23. 23. DR as Preprocessing for RL Bitzer et al. (2010) 1: GPLVM based latent space encoding task space constraints. Non-linear dimensionality reduction Data-efficient learning with GP-mapping Value-function reinforcement learning (TD(0)) applied to tractable search space. 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 15 / 15
  24. 24. Combining DR and RL Luck et al. (2014) 1: Joint learning of latent space and optimal policy. a = W(ZT Φ) + MΦ + EΦ (2) PePPER: Formulated Expectation-Maximization formulation based on KL-divergence lower bound. Probabilistic PCA used as model for learning latent space. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  25. 25. Combining DR and RL Inverse Kinematics: Planning in joint angle space of highly redundant robot (20 DOF). Standing on one leg: Applied to full-humanoid robot and policy learned from scratch. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  26. 26. Discussion Robotic Clothing Assistance involves several problems. Propose use of DR with RL for efficient motor-skills learning. Future Work Implement Latent Space RL framework for Clothing Assistance framework. Combine real-time state estimation with motor-skills learning framework. 15 / 15
  27. 27. References Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a dual-arm robot.” Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on. IEEE, 2011. Ho, Edmond SL, and Taku Komura. “Character motion synthesis by topology coordinates.” Computer Graphics Forum. Vol. 28. No. 2. Blackwell Publishing Ltd, 2009. Pohl, William F. “The self-linking number of a closed space curve(Gauss integral formula treated for disjoint closed space curves linking number).” Journal of Mathematics and Mechanics 17 (1968): 975-985. Miyamoto, Hiroyuki, et al. “A kendama learning robot based on bi-directional theory.” Neural networks 9.8 (1996): 1281-1302. Koganti, Nishanth, et al. “Cloth dynamics modeling in latent spaces and its application to robotic clothing assistance.” Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015. Deisenroth, Marc Peter, Dieter Fox, and Carl Edward Rasmussen. “Gaussian processes for data-efficient learning in robotics and control.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 37.2 (2015): 408-423. Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” arXiv preprint arXiv:1504.00702 (2015). 15 / 15

×