Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-

1,745 views

Published on

2018/10/26
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Published in: Technology

[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-

  1. 1. 1 Tatsuya Matsushima @__tmats__ , Matsuo Lab
  2. 2. • • • • • • • 2
  3. 3. (SRL) • • • • • • 3
  4. 4. • • • 
 • • • 
 4
  5. 5. • • • • • 5
  6. 6. • SRL 6 at ∈ 𝒜 ot ∈ 𝒪 at ot ot+1 ˜st ˜st+1˜st ∈ ˜𝒮 ˜st ∈ ˜𝒮 st ∈ 𝒮 o1:t st st = ϕ (o1:t)
  7. 7. SRL • • • • 7
  8. 8. SRL • • • 8 st ϕ ϕ−1 st = ϕ (ot; θϕ) ̂ot = ϕ−1 (st; θϕ−1 )
  9. 9. SRL • • • 9 ̂st+1 = f (st, at; θfwd) st = ϕ (ot; θϕ) ϕ st at st+1f
  10. 10. SRL • • 10 st st+1 at ϕ at st = ϕ (ot; θϕ) ̂at = g (st, st+1; θinv) 

  11. 11. SRL • • • • 11 Loss = ℒprior (s1:n; θϕ |c) s1:nc 
 st = ϕ (ot; θϕ)
  12. 12. • • • • • Why SRL? 12
  13. 13. 13
  14. 14. • • • • 14
  15. 15. • • • • • • • 15
  16. 16. • • • • • • • • 16
  17. 17. • • • 17 ̂st+1 = Wst + Uat + V
  18. 18. E2C [Watter+ 2015] • • 
 • 
 18 st ̂st+1 ∼ 𝒩 (μ = Wst + Uat + V, σ) ̂st+1 st+1
  19. 19. • • 
 • 
 
 World Model [Ha+ 2018] 19
  20. 20. • • • 20 ltθt pt
  21. 21. ICM [Pathak+ 2017] • • • 
 21 ℒfwd ( ̂ϕ (ot+1), ̂f ( ̂ϕ (ot), at)) = 1 2 ̂f ( ̂ϕ (ot), at) − ̂ϕ (ot+1) 2 2 ℒfwd min θP,θI,θF [−λ𝔼π(st; θP) [Σtrt] + (1 − β)ℒinv + βℒfwd]
  22. 22. • • 22 min G,Q,ℳ max D V(G, D) − λIVLB(G, Q)
  23. 23. • • • 23 

  24. 24. • • 
 • • 
 
 • • 
 
 24 ℒSlowness(D, ϕ) = 𝔼 [ Δst 2 ] ℒVariabilty(D, ϕ) = 𝔼 [e− st1 − st2 ]
  25. 25. • • • 
 
 • • 
 25 ℒProp(D, ϕ) = 𝔼 [( Δst2 − Δst1 ) 2 |at1 = at2] ℒRep(D, ϕ) = 𝔼 [ e − st2 − st1 2 Δst2 − Δst1 2 |at1 = at2]
  26. 26. • 26 / 
 ※ 
 E2C
 [Watter+ 2015] ✔ ✔ ✔ ✔ World Model
 [Ha+ 2018] ✔ ✔ ✔ ICM
 [Pathak+ 2017] ✔ ✔ ✔ Causal InfoGAN
 [Kurutach+ 2018] ✔ ✔ ✔ ✔ VPN
 [Oh+ 2017] ✔ ✔ Robotic Priors
 [Jonschkowski+ 2015] ✔ ✔
  27. 27. • • • • • 27 Robotic Priors
 [Jon-schkowski+ 2015] slot car racing 16×16×3 2 (25) E2C
 [Watter+ 2015] cart-pole 80×80×3 8 ICM
 [Pathak+ 2017] Mario Bros. 42×42×3 2 (14)
  28. 28. • • • • • • • 28 KNN − MSE(s) = 1 k ∑ s′∈KNN(s,k) ˜s − ˜s′ 2
  29. 29. • • • • • • 29
  30. 30. 30
  31. 31. • • • • • 31
  32. 32. • • • • • • 32
  33. 33. S-RL Toolbox • • • • • • • • • • 33
  34. 34. 34
  35. 35. • • • • • • • 35
  36. 36. Appendix 36
  37. 37. References 37
  38. 38. References 38
  39. 39. References 39

×