Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Master Ceremony

584 views

Published on

A 5 minute presentation on the thesis "Learning To Hop Using Guided Policy Search". Presented during the ETH Zurich Computer Science Master Ceremony 2017.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Master Ceremony

  1. 1. Julian Viereck, Supervisors: Felix Berkenkamp1), Alexander Herzog2), Ludovic Righetti2), Prof. Andreas Krause1)
 1) Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich
 2) Autonomous Motion Department, Max-Plank Institute for Intelligent Systems, Tübingen 9 June 2017 | ETH Zurich Computer Science Master Ceremony | Zurich Learning To Hop Using
 Guided Policy Search
  2. 2. https://am.is.tuebingen.mpg.de Development
 More autonomous devices Hard wiring ➡ Self learning blog.americansafetycouncil.com http://blog.robotiq.com/
  3. 3. https://am.is.tuebingen.mpg.de Development
 More autonomous devices Hard wiring ➡ Self learning blog.americansafetycouncil.com http://blog.robotiq.com/
  4. 4. Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t Environment State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Reinforcement Learning Reward )f ooobararoo } = pq(ut|ot), `(xt, ut), rt, xt, at
  5. 5. Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t ? Environment State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Guided Policy Search
  6. 6. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Local Behavior Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior multiple State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
  7. 7. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Local Behavior Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior multiple State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t Optimization objective
  8. 8. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Dynamics Local Behavior ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t multiple
  9. 9. Dynamics
  10. 10. Dynamics
  11. 11. Dynamics
  12. 12. Dynamics
  13. 13. Dynamics
  14. 14. Dynamics
  15. 15. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics
  16. 16. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics →
  17. 17. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics µD? t , SD? t = argmax NIW(µ, S|µD t | p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) =→
  18. 18. Local Behavior
  19. 19. Local Behavior
  20. 20. Local Behavior
  21. 21. Local Behavior
  22. 22. Local Behavior
  23. 23. Local Behavior
  24. 24. Local Behavior
  25. 25. Local Behavior
  26. 26. Local Behavior
  27. 27. Local Behavior
  28. 28. Local Behavior
  29. 29. Local Behavior
  30. 30. Local Behavior
  31. 31. Local Behavior
  32. 32. Local Behavior
  33. 33. Local Behavior
  34. 34. Landing Local Behavior
  35. 35. Landing Jumping Local Behavior
  36. 36. Global Behavior Dt = 8 >< >: 0 @ xt ut xt+1 1 A 1 , ..., 0 @ xt ut xt+1 1 A J 9 >= >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Dt = 8 >< >: 0 @ xt ut xt+1 1 A 1 , ..., 0 @ xt ut xt+1 1 A J 9 >= >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot)
  37. 37. Without Noise With Noise
  38. 38. https://twitter.com/Copvids911/status/837332885984137216
  39. 39. Thank you!

×