Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- NIPS Paper Reading, Data Programing by Kotaro Tanahashi 1724 views
- NIPS2016 Supervised Word Mover's Di... by Recruit Lifestyle... 2171 views
- Dynamic filter networks by Tatsuya Shirakawa 2016 views
- Binarized Neural Networks by Shotaro Sano 2606 views
- Fractality of Massive Graphs: Scala... by Kenko Nakamura 2600 views

1,589 views

Published on

Published in:
Engineering

No Downloads

Total views

1,589

On SlideShare

0

From Embeds

0

Number of Embeds

1,070

Shares

0

Downloads

6

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Bayesian Nonparametric Motor-skill Representations for Eﬃcient Learning of Robotic Clothing Assistance Workshop on Practical Bayesian Nonparametrics, NIPS 2016 Nishanth Koganti1,2 , Tomoya Tamei1 , Kazushi Ikeda1 , Tomohiro Shibata2 1 Nara Institute of Science and Technology, Ikoma, Japan 2 Kyushu Institute of Technology, Kitakyushu, Japan February 11, 2017 0 / 15
- 2. Robotic Clothing Assistance Aging causes loss of motor functions to perform dextrous tasks. Goal: Develop learning framework for humanoid robots to perform clothing assistance. Challenge: Close interaction of robot with clothes and human Non-rigid clothing material 1 Varying posture of human 1 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 1 / 15
- 3. Reinforcement Learning for Clothing Assistance Markov Decision Process (MDP) formulated with low-dimensional state, policy representations. 1 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
- 4. Clothing Assistance Framework 1 : Outline 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
- 5. Clothing Assistance Framework 1 : Policy Control policy parametrized by Via-points 2 of trajectory. Finite diﬀerence policy gradient method is used for policy update: ∂η(θ) ∂θ ≈ r(θi + ∆θ) − r(θi − ∆θ) 2∆θ θ ← θ + α ∂η(θ) ∂θ 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Wada, Y. et al. “Theory for handwriting on minimization principle.” in Biological Cybernetics, 1995 3 / 15
- 6. Problem: Adaptive Learning of Clothing Skills Design of robust motor-skills learning framework is crucial for real-world implementation on low-cost robots. Tight coupling with cloth and close proximity to Human. Optimal policy varies with initial conditions. Non-rigid clothing material Varying posture of human 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 4 / 15
- 7. Reinforcement Learning in Latent Space Combining motor-skills learning with dimensionality reduction: Tractable search space reducing learning time. Latent space can be modeled to capture task space constraints. Existing methods rely on linear models or MAP estimate of latent space. Bitzer et al., 2010 1 Luck et al., 2014 2 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 2 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 5 / 15
- 8. Motor-skill Learning in Latent Spaces Use Bayesian nonparametric nonlinear dimensionality reduction for eﬃcient learning of clothing skills 1. 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Eﬃcient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 6 / 15
- 9. Bayesian Gaussian Process Latent Variable Model Latent variable model (Titsias et al., 2010 1): y = f (x) + , ∈ N(0, σ2 I) y ∈ RD : Observed Variable x ∈ RQ (Q D): Unknown latent variable f : x → y: Mapping given by Gaussian Process p(Y|X) = D d=1 N(yd |0, KNN + β−1 IN) x f w, θ y 1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 7 / 15
- 10. BGPLVM: Manifold Learning Bayesian Inference: Posterior distribution on the latent space. p(Y) = X p(Y|X)p(X)dX Marginalization made tractable using variational inference: q(X) = N n=1 N(xn|µn, Sn) log(p(Y)) ≥ q(X)p(Y|X)dX − q(X) log q(X) p(X) dX Automatic dimensionality reduction possible using ARD kernel: k(x, x ) = σ2 f exp − 1 2 Q q=1 wq(xq − xq)2 1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 8 / 15
- 11. Motor-skills Transfer through Latent Space BGPLVM model trained on robot joint angles ∈ R14 for kinesthetic demonstration of clothing assistance 1. 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 9 / 15
- 12. Reinforcement Learning in BGPLVM Space Apply Cross Entropy Method to perform policy improvement: θ∗ ∼ N(θ|µ∗ , Σ∗ ) µ∗ := mean(argmax θold), Σ∗ := var(argmax θold) Represent policy using Dynamic Movement Primitive (DMP): τ¨x = K(g − x) − D ˙x + (g − x0)f f (s) = i wi ψi (s)s i ψi (s) , where τ ˙s = −αs 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Eﬃcient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 10 / 15
- 13. Reinforcement Learning in BGPLVM Space Represent reward function by distance from desired Via-points of current policy: R(π(θ)) = ndims i=1 nvia j=1 Vi,j − πi (θ, ti,j) 2 11 / 15
- 14. Latent Space Controller for Clothing Tasks 1 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 12 / 15
- 15. Generalization in Latent Space Evaluation: Reconstruction error of latent space with RMS Error 1. Dataset: Clothing trajectories for 4 postures: Shoulder Angle ∈ {65o , 70o , 75o , 80o }. PCA GPLVM BGPLVM 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 13 / 15
- 16. Reinforcement Learning in Latent Space Apply Reinforcement Learning in diﬀerent action spaces with same formulation and reward function Parameters: 50 × ndims basis functions CEM: 50 rollouts per iteration. Policy Update: 5 best rollouts per iteration 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Eﬃcient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 14 / 15
- 17. Moving forward Immediate Goal: Latent spaces for Robotics applications: Auto-regressive prior on latent space to capture task dynamics. Explicit model of human-robot interaction as constraint. Ambitious Goal: Combine policy search RL and BGPLVM: Non-linear dimensionality reduction. Bayesian and data-eﬃcient learning. Data-eﬃcient 1 Bayesian Inference 1 1 Deisenroth, M. P. et al., “Gaussian processes for data-eﬃcient learning in robotics and control” in IEEE Transactions PAMI, 2015 15 / 15
- 18. Appendix 15 / 15
- 19. Topology Coordinates To approximate Markov Decision Process, the relationship between cloth and subject needs to be observed as much as possible. Low dimensional representations need to be used for a fast learning time. Topological Coordinates introduced to address both requirements. Concept proposed by Edmond et. al(2009) 1 . Given 2 line segments, the amount of twist(writhe) between them is given by the Guassian Linking Integral(GLI): w = GLI(γ1, γ2) = 1 4π γ1 γ2 dγ1 × dγ2 · (γ1 − γ2) γ1 − γ2 3 (1) 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
- 20. Topology Space The relationship between linesegments is deﬁned by the Writhe matrix(Tn×m). Given line segments S1, S2 with n,m links, Tn×m is given by: Tij = GLI(Si 1, Sj 2) The parameters writhe, center, density are deﬁned from writhe matrix which form the Topology Space. 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
- 21. Clothing Assistance Framework 1 : State and Reward Low-dimensional representation using Topology Coordinates 2 . Reward given by distance between ﬁnal state and target state: ri = − starget i − si (i = 1, 2, 3), r(s) = 3 i=1 ri − µi σi 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Ho, E. S., et al., “Character synthesis by topology coordinates”, in Computer Graphics Forum 2009 15 / 15
- 22. Combining DR and RL Policy representation: a = W(ZT Φ) + MΦ + EΦ Expectation Step: Posterior distribution over Latent Variables pθold (ZT Φ|a) = N(CWT (a − MΦ), Cσ2 tr(ΦΦT )), C = (σ2 I + WT W) Maximization: Compute gradients with respect to Policy parameters ∂lnp(a)Qt π ∂M , ∂lnp(a)Qt π ∂W , ∂lnp(a)Qt π ∂σ2 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
- 23. DR as Preprocessing for RL Bitzer et al. (2010) 1: GPLVM based latent space encoding task space constraints. Non-linear dimensionality reduction Data-eﬃcient learning with GP-mapping Value-function reinforcement learning (TD(0)) applied to tractable search space. 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 15 / 15
- 24. Combining DR and RL Luck et al. (2014) 1: Joint learning of latent space and optimal policy. a = W(ZT Φ) + MΦ + EΦ (2) PePPER: Formulated Expectation-Maximization formulation based on KL-divergence lower bound. Probabilistic PCA used as model for learning latent space. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
- 25. Combining DR and RL Inverse Kinematics: Planning in joint angle space of highly redundant robot (20 DOF). Standing on one leg: Applied to full-humanoid robot and policy learned from scratch. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
- 26. Discussion Robotic Clothing Assistance involves several problems. Propose use of DR with RL for eﬃcient motor-skills learning. Future Work Implement Latent Space RL framework for Clothing Assistance framework. Combine real-time state estimation with motor-skills learning framework. 15 / 15
- 27. References Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a dual-arm robot.” Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on. IEEE, 2011. Ho, Edmond SL, and Taku Komura. “Character motion synthesis by topology coordinates.” Computer Graphics Forum. Vol. 28. No. 2. Blackwell Publishing Ltd, 2009. Pohl, William F. “The self-linking number of a closed space curve(Gauss integral formula treated for disjoint closed space curves linking number).” Journal of Mathematics and Mechanics 17 (1968): 975-985. Miyamoto, Hiroyuki, et al. “A kendama learning robot based on bi-directional theory.” Neural networks 9.8 (1996): 1281-1302. Koganti, Nishanth, et al. “Cloth dynamics modeling in latent spaces and its application to robotic clothing assistance.” Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015. Deisenroth, Marc Peter, Dieter Fox, and Carl Edward Rasmussen. “Gaussian processes for data-eﬃcient learning in robotics and control.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 37.2 (2015): 408-423. Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” arXiv preprint arXiv:1504.00702 (2015). 15 / 15

No public clipboards found for this slide

Be the first to comment