SlideShare a Scribd company logo
1 of 40
Download to read offline
Julian Viereck, Supervisors: Felix Berkenkamp1), Alexander Herzog2), Ludovic Righetti2), Prof. Andreas Krause1)

1)
Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich

2) Autonomous Motion Department, Max-Plank Institute for Intelligent Systems, Tübingen
9 June 2017 | ETH Zurich Computer Science Master Ceremony | Zurich
Learning To Hop Using

Guided Policy Search
https://am.is.tuebingen.mpg.de
Development

More autonomous devices
Hard wiring ➡ Self learning
blog.americansafetycouncil.com http://blog.robotiq.com/
https://am.is.tuebingen.mpg.de
Development

More autonomous devices
Hard wiring ➡ Self learning
blog.americansafetycouncil.com http://blog.robotiq.com/
Actions
Dt =
>:
@ ut
xt+1
A , ..., @ ut
xt+1
A
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gm
t
| {z
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
Environment
State
ooobararoo
}
pq(ut|ot), `(xt, ut), rt, xt, at
Reinforcement Learning
Reward
)f ooobararoo
}
= pq(ut|ot), `(xt, ut), rt, xt, at
Actions
Dt =
>:
@ ut
xt+1
A , ..., @ ut
xt+1
A
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gm
t
| {z
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
?
Environment
State
ooobararoo
}
pq(ut|ot), `(xt, ut), rt, xt, at
Cost
, n0)f ooobararoo
}
1
A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at
Dynamics
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Guided Policy Search
Environment
Cost
, n0)f ooobararoo
}
1
A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at
Local Behavior
Dynamics
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Global Behavior
multiple
State
ooobararoo
}
pq(ut|ot), `(xt, ut), rt, xt, at Actions
Dt =
>:
@ ut
xt+1
A , ..., @ ut
xt+1
A
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gm
t
| {z
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
Environment
Cost
, n0)f ooobararoo
}
1
A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at
Local Behavior
Dynamics
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Global Behavior
multiple
State
ooobararoo
}
pq(ut|ot), `(xt, ut), rt, xt, at Actions
Dt =
>:
@ ut
xt+1
A , ..., @ ut
xt+1
A
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gm
t
| {z
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
Optimization objective
Environment
Cost
, n0)f ooobararoo
}
1
A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at
Dynamics
Local Behavior
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Global Behavior
State
ooobararoo
}
pq(ut|ot), `(xt, ut), rt, xt, at Actions
Dt =
>:
@ ut
xt+1
A , ..., @ ut
xt+1
A
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gm
t
| {z
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
multiple
Dynamics
Dynamics
Dynamics
Dynamics
Dynamics
Dynamics
GMM Prior
p(x) = N
⇣
x µD
t , SD
t
⌘
D?
, SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
0
@
xt
ut
xt+1
1
A ⇠ N = pq(ut|ot)
Dynamics
GMM Prior
p(x) = N
⇣
x µD
t , SD
t
⌘
D?
, SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
0
@
xt
ut
xt+1
1
A ⇠ N = pq(ut|ot)
Dynamics
→
GMM Prior
p(x) = N
⇣
x µD
t , SD
t
⌘
D?
, SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
0
@
xt
ut
xt+1
1
A ⇠ N = pq(ut|ot)
Dynamics
µD?
t , SD?
t = argmax NIW(µ, S|µD
t
|
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) =→
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Local Behavior
Landing
Local Behavior
Landing Jumping
Local Behavior
Global Behavior
Dt =
8
><
>:
0
@
xt
ut
xt+1
1
A
1
, ...,
0
@
xt
ut
xt+1
1
A
J
9
>=
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Dt =
8
><
>:
0
@
xt
ut
xt+1
1
A
1
, ...,
0
@
xt
ut
xt+1
1
A
J
9
>=
>;
Dt ⇠ N
⇣
µD
t , SD
t
⌘
xt+1 ⇠ N
⇣
µD
t,xt+1|xt,ut
, SD
t,xt+1|xt,ut
⌘
⇠ N
✓
fxut

xt
ut
+ fct , Ft
◆
p(x) = N
⇣
x µD
t , SD
t
⌘
µD?
t , SD?
t = argmax NIW(µ, S|µD
t , SD
t , µ
gmm
t , S
gmm
t , k0, n0)f ooobararoo
| {z }
1
p(xt+1|xt, ut) = 1
ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t =
2
4
xt
ut
xt+1
3
5 = pq(ut|ot)
Without Noise With Noise
https://twitter.com/Copvids911/status/837332885984137216
Thank you!

More Related Content

What's hot

Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationPantelis Sopasakis
 
Beyond clicks dwell time for personalization
Beyond clicks dwell time for personalizationBeyond clicks dwell time for personalization
Beyond clicks dwell time for personalizationAkihiko Watanabe
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlPantelis Sopasakis
 
Slides banditrmhc
Slides banditrmhcSlides banditrmhc
Slides banditrmhcJialin LIU
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2ybenjo
 
Transcendental Programming in Ruby
Transcendental Programming in RubyTranscendental Programming in Ruby
Transcendental Programming in Rubymametter
 

What's hot (10)

MM2020-AV
MM2020-AVMM2020-AV
MM2020-AV
 
กลศาสตร์
กลศาสตร์กลศาสตร์
กลศาสตร์
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
 
Beyond clicks dwell time for personalization
Beyond clicks dwell time for personalizationBeyond clicks dwell time for personalization
Beyond clicks dwell time for personalization
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude Control
 
Slides banditrmhc
Slides banditrmhcSlides banditrmhc
Slides banditrmhc
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2
 
Transcendental Programming in Ruby
Transcendental Programming in RubyTranscendental Programming in Ruby
Transcendental Programming in Ruby
 

Similar to Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Master Ceremony

統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライドYuchi Matsuoka
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal InferenceDaiki Tanaka
 
Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulasNidhal Selmi
 
脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」Kohei Ichikawa
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator SplittingFabian Pedregosa
 
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example DetectionDeep Learning JP
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Shohei Taniguchi
 
Real Time Code Generation for Nonlinear Model Predictive Control
Real Time Code Generation for Nonlinear Model Predictive ControlReal Time Code Generation for Nonlinear Model Predictive Control
Real Time Code Generation for Nonlinear Model Predictive ControlBehzad Samadi
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
Bat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsBat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsXin-She Yang
 
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-ssusere0a682
 
Tugasmatematikakelompok 150715235527-lva1-app6892
Tugasmatematikakelompok 150715235527-lva1-app6892Tugasmatematikakelompok 150715235527-lva1-app6892
Tugasmatematikakelompok 150715235527-lva1-app6892drayertaurus
 
Hindsight experience replay paper review
Hindsight experience replay paper reviewHindsight experience replay paper review
Hindsight experience replay paper reviewEuijin Jeong
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsSeonho Park
 

Similar to Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Master Ceremony (20)

HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
 
Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulas
 
Interpolation
InterpolationInterpolation
Interpolation
 
ASforSGD
ASforSGDASforSGD
ASforSGD
 
脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
 
Prelude to halide_public
Prelude to halide_publicPrelude to halide_public
Prelude to halide_public
 
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
Real Time Code Generation for Nonlinear Model Predictive Control
Real Time Code Generation for Nonlinear Model Predictive ControlReal Time Code Generation for Nonlinear Model Predictive Control
Real Time Code Generation for Nonlinear Model Predictive Control
 
Bayes2
Bayes2Bayes2
Bayes2
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Bat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic ApplicationsBat algorithm for Topology Optimization in Microelectronic Applications
Bat algorithm for Topology Optimization in Microelectronic Applications
 
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-
ゲーム理論BASIC 演習53 -ベイジアンゲームにおけるナッシュ均衡-
 
Tugasmatematikakelompok 150715235527-lva1-app6892
Tugasmatematikakelompok 150715235527-lva1-app6892Tugasmatematikakelompok 150715235527-lva1-app6892
Tugasmatematikakelompok 150715235527-lva1-app6892
 
Hindsight experience replay paper review
Hindsight experience replay paper reviewHindsight experience replay paper review
Hindsight experience replay paper review
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithms
 

More from Julian Viereck

More from Julian Viereck (6)

Implementing new WebAPIs
Implementing new WebAPIsImplementing new WebAPIs
Implementing new WebAPIs
 
Implementing New Web
Implementing New WebImplementing New Web
Implementing New Web
 
PDF.JS at SwissJeese 2012
PDF.JS at SwissJeese 2012PDF.JS at SwissJeese 2012
PDF.JS at SwissJeese 2012
 
2011 11-mozcamp
2011 11-mozcamp2011 11-mozcamp
2011 11-mozcamp
 
2011 09-pdfjs
2011 09-pdfjs2011 09-pdfjs
2011 09-pdfjs
 
2011 05-jszurich
2011 05-jszurich2011 05-jszurich
2011 05-jszurich
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Master Ceremony

  • 1. Julian Viereck, Supervisors: Felix Berkenkamp1), Alexander Herzog2), Ludovic Righetti2), Prof. Andreas Krause1)
 1) Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich
 2) Autonomous Motion Department, Max-Plank Institute for Intelligent Systems, Tübingen 9 June 2017 | ETH Zurich Computer Science Master Ceremony | Zurich Learning To Hop Using
 Guided Policy Search
  • 2. https://am.is.tuebingen.mpg.de Development
 More autonomous devices Hard wiring ➡ Self learning blog.americansafetycouncil.com http://blog.robotiq.com/
  • 3. https://am.is.tuebingen.mpg.de Development
 More autonomous devices Hard wiring ➡ Self learning blog.americansafetycouncil.com http://blog.robotiq.com/
  • 4. Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t Environment State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Reinforcement Learning Reward )f ooobararoo } = pq(ut|ot), `(xt, ut), rt, xt, at
  • 5. Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t ? Environment State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Guided Policy Search
  • 6. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Local Behavior Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior multiple State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t
  • 7. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Local Behavior Dynamics ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior multiple State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t Optimization objective
  • 8. Environment Cost , n0)f ooobararoo } 1 A ⇠= pq(ut|ot), `(xt, ut), rt, xt, at Dynamics Local Behavior ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Global Behavior State ooobararoo } pq(ut|ot), `(xt, ut), rt, xt, at Actions Dt = >: @ ut xt+1 A , ..., @ ut xt+1 A >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gm t | {z 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t multiple
  • 9.
  • 16. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics
  • 17. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics →
  • 18. GMM Prior p(x) = N ⇣ x µD t , SD t ⌘ D? , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 |xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 0 @ xt ut xt+1 1 A ⇠ N = pq(ut|ot) Dynamics µD? t , SD? t = argmax NIW(µ, S|µD t | p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) =→
  • 37. Global Behavior Dt = 8 >< >: 0 @ xt ut xt+1 1 A 1 , ..., 0 @ xt ut xt+1 1 A J 9 >= >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot) Dt = 8 >< >: 0 @ xt ut xt+1 1 A 1 , ..., 0 @ xt ut xt+1 1 A J 9 >= >; Dt ⇠ N ⇣ µD t , SD t ⌘ xt+1 ⇠ N ⇣ µD t,xt+1|xt,ut , SD t,xt+1|xt,ut ⌘ ⇠ N ✓ fxut  xt ut + fct , Ft ◆ p(x) = N ⇣ x µD t , SD t ⌘ µD? t , SD? t = argmax NIW(µ, S|µD t , SD t , µ gmm t , S gmm t , k0, n0)f ooobararoo | {z } 1 p(xt+1|xt, ut) = 1 ˆpi = pi = p(xt+1|xt, ut) = ut, ot = t = 2 4 xt ut xt+1 3 5 = pq(ut|ot)