Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[DL輪読会] off-policyなメタ強化学習

958 views

Published on

2019/04/05
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[DL輪読会] off-policyなメタ強化学習

  1. 1. 1 off-policy Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Guided Meta-Policy Search Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
  2. 2. • off-policy arXiv • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [Rakelly+ 2019] (2019/3/19) • Guided Meta-Policy Search [Mendonca+ 2019] (2019/4/1) • MAML meta-training off-policy 2
  3. 3. 3
  4. 4.  (meta learning) • : Wiki http://ibisforest.org/index.php?%E3%83%A1%E3%82%BF%E5%AD%A6%E7%BF%92 • [DL ]Meta-Learning Probabilistic Inference for Prediction ( ) • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 4
  5. 5. MAML MAML (Model Agnostic Meta-Learning) [Finn+ 2017] • • adapt • MAML • 2 • 1 [Nichol+2018] • meta-test 
 5 min θ ∑ 𝒯 ℒ (θ − α∇θℒ (θ, 𝒟tr 𝒯), 𝒟val 𝒯 ) = min θ ∑ 𝒯 ℒ (ϕ 𝒯, 𝒟val 𝒯 ) θ ϕ 𝒯 ϕ 𝒯test = θ − α∇θℒ (θ, 𝒟tr 𝒯test)
  6. 6. MAML • loss loss( ) • MAML model-based [Nagabandi+ 2018] [Gupta+ 2018] • [DL ]Meta Reinforcement Learning ( ) • https://www.slideshare.net/DeepLearningJP2016/dl-130067084 6 ℒRL (ϕ, 𝒟 𝒯i) = − 1 𝒟 𝒯i ∑ st,at∈𝒟 ri (st, at) = − 𝔼st,at∼πϕ,q 𝒯i [ 1 H H ∑ t=1 ri (st, at) ]
  7. 7. ( ) On-policy v.s. Off-policy On-policy ( ) • ( ) • • ) ε-greedy Off-policy ( ) • • ※ MAML train test (= off-policy ) 7
  8. 8. Efficient Off-Policy Meta-Reinforcement
 Learning via Probabilistic Context Variables 8
  9. 9. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables • https://arxiv.org/abs/1903.08254 (Submitted on 19 Mar 2019) • Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine • UC Berkeley (BAIR) • Deep RL ”UC Berkeley” • • https://github.com/katerakelly/oyster • (BAIR )PyTorch rlkit 9
  10. 10. TL; DR • meta learning off-policy (PEARL) • (context) • permutation invariant • 20-100 10
  11. 11. ( MAML ) • meta-training adaptation on-policy • MAML meta-train meta-test off-policy • adapt • 11
  12. 12. 12
  13. 13. • off-policy RL (soft actor-critic, SAC [Haarnoja+ 2018]) 
 context (PEARL) • Meta-training adapt • meta-train policy context • meta-test context policy adapt • policy off-policy meta-train meta- test on-policy 13
  14. 14. MDP • • • • : • : • 1 • • 14 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} 𝒯 c 𝒯 n = (sn, an, rn, s′n) c = c 𝒯 1:N p(𝒯)
  15. 15. context • adapt • • (Inference network) • • prior Gaussian • meta-train meta-test 15 z z qϕ(z|c) 𝔼 𝒯 [ 𝔼z∼qϕ(z|c 𝒯 ) [ R(𝒯, z) + βDKL (qϕ (z|c 𝒯 ) ∥p(z))]] p(z) qϕ(z|c) ϕ zz
  16. 16. context • MDP • permutation invariant • Inference network • Gaussian 16 {si, ai, s′i, ri} qϕ (z|c1:N) ∝ ΠN n=1Ψϕ (z|cn) Ψϕ (z|cn) = 𝒩 (fμ ϕ (cn), fσ ϕ (cn))
  17. 17. off-policy • policy 
 • actor ciritic 
 • 
 • on-policy 
 on-policy test 17 qϕ(z|c) ℬ 𝒮c
  18. 18. off-policy • Soft Actor-Critic (SAC) [Haarnoja+ 2018] context • SAC maxEntRL( ) off-policy actor-critic • actor critic reparameterization trick • critic loss: 
 • actor loss: 18 ℒcritic = 𝔼(s, a, r, s′ ) ∼ ℬ z ∼ qϕ(z|c) [Qθ(s, a, z) − (r + V (s′, z))] 2 z ℒactor = 𝔼s∼ℬ,a∼πθ DKL ( πθ(a|s, z)∥ exp (Qθ(s, a, z)) 𝒵θ(s) )
  19. 19. 19
  20. 20. • MuJoCo 6 • Half-Cheetah, Humanoid, Ant, Walker (Half-Cheetah Ant 2 ) • • adapt • 20-100 
 • : meta-training • : 20
  21. 21. • on-policy (MAESN[Gupta+ 2018]) • sparse navigation • meta-test 
 • 
 • context • MAESN 21
  22. 22. Ablation Study • • Half-Cheetah-Vel • RNN • RNN-tran: de-correlated • RNN-traj: • permutation invariant 
 22
  23. 23. Ablation Study • • Half-Cheetah-Vel • • off-policy: off-policy( ) • off-policy RL-batch: policy • 
 (PEARL) 23
  24. 24. Ablation Study • context • sparse navigation • context • 
 24
  25. 25. 25
  26. 26. • off-policy (PEARL) • context policy context off-policy • meta-training 26
  27. 27. Guided Meta-Policy Search 27
  28. 28. Guided Meta-Policy Search • https://arxiv.org/abs/1904.00956 (Submitted on 1 Apr 2019) • Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn • UC Berkeley (BAIR) • … • • https://github.com/RussellM2020/GMPS • Website • https://sites.google.com/berkeley.edu/guided-metapolicy-search 28
  29. 29. TL; DR • meta learning off-policy (GMPS) • meta-train RL • meta-train meta-objective( ) imitation learning (behaviour cloning) • meta-training task learning meta-learning 2 29
  30. 30. ( MAML ) • meta-training adaptation on-policy • [Rakelly+ 2019] • meta-training meta-test 30
  31. 31. 31
  32. 32. • meta-train meta-objective( ) (behaviour cloning) • meta-training 2 • task learning: meta-training policy • policy meta-test expert • meta-learning: policy meta-level supervised 32
  33. 33. [Rakelly+ 2019] • • • 33 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} p(𝒯)
  34. 34. task learning • meta-training 
 / policy • meta-learning • MAML • adapt • MAML • (behaviour cloning) 34 𝒯i {π*i } ℒRL (ϕi, 𝒟i) ϕi 𝒯i ℒBC (ϕi, 𝒟i) ≜ − ∑ (st,at)∈𝒟 log πϕ (at |st)
  35. 35. meta-learning • meta-training 
 
 • policy 
 meta-objective • 
 
 behaviour cloning compounding error 
 35 𝒯i π*i D*i min θ ∑ 𝒯i ∑ 𝒟val i ∼𝒟*i 𝔼 𝒟tr i ∼πθ [ ℒBC (θ − α∇θℒRL (θ, 𝒟tr i ), 𝒟val i )] θ 𝒯i ϕi D*i
  36. 36. • meta-learning task learning meta-learning • policy • • meta-training • ) reward shaping • MAML 36
  37. 37. policy • policy 
 contextual policy • ( ID ) • meta-training • meta-test meta-training • soft actor-critic(SAC) [Haarnoja+ 2018] 37 πθ (at |st, ω) ω
  38. 38. • Behaviour cloning meta-objective 
 • 
 • 
 
 • • Behaviour cloning 38 θ ϕi πθ ϕi = θ + α𝔼τ∼πθ [ πθ(τ) πθinit (τ) ∇θlog πθ(τ)Ai(τ) ] Ai θ ← θ − β∇θℒBC (ϕi, 𝒟val i )
  39. 39. 39
  40. 40. • • Pushing (full state) • • • Pushing (vision) • • Door opening • • • (Ant) • https://sites.google.com/berkeley.edu/guided-metapolicy-search 40
  41. 41. • • meta-training task context( ) • SAC • : meta-training : 41
  42. 42. • • Door Opening Ant • • pushing • 42
  43. 43. 43
  44. 44. • off-policy (GMPS) • meta-training task learning meta-learning 2 (behaviour cloning) • meta-training 44
  45. 45. 45
  46. 46. • 2 • one-step update adapt (BAIR ) • ) MAML[Finn+ 2017] • adapt (DeepMind ) • ) Neural Processes[Garnelo+ 2018], GQN[Eslami+ 2018] • • • [DL ]Meta-Learning Probabilistic Inference for Prediction • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 • pro-con 46
  47. 47. Appendix 47
  48. 48. References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http://science.sciencemag.org/content/360/6394/1204 {Finn+ 2017] Chelsea Finn, Pieter Abbeel and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1126-1135, 2017. http://proceedings.mlr.press/v70/ finn17a.html [Garnelo+ 2018] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola and Danilo J. Rezende, S.M. Ali Eslami and Yee Whye Teh. “Neural Processes”. https://arxiv.org/abs/1807.01622. [Gupta+ 2018] Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel and Sergey Levine. ”Meta-Reinforcement Learning of Structured Exploration Strategies”. In Advances in Neural Information Processing Systems, 2018. https://nips.cc/Conferences/2018/ Schedule?showEvent=12658 [Haarnoja+ 2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1861-1870, 2018. http://proceedings.mlr.press/v80/haarnoja18b.html [Mendonca+ 2019] Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Guided Meta- Policy Search”. https://arxiv.org/abs/1904.00956 [Nagabandi+ 2018] Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning”. https://arxiv.org/abs/1803.11347 [Nichol+2018] Alex Nichol, Joshua Achiam and John Schulman. “On First-Order Meta-Learning Algorithms”. https://arxiv.org/abs/1803.02999 [Rakelly+ 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn ands Sergey Levine. “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables”. https://arxiv.org/abs/1903.08254 48

×