SlideShare a Scribd company logo
1
off-policy
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Guided Meta-Policy Search
Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
• off-policy arXiv
• Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
[Rakelly+ 2019] (2019/3/19)
• Guided Meta-Policy Search [Mendonca+ 2019] (2019/4/1)
• MAML meta-training
off-policy
2
3
 (meta learning)
• : Wiki http://ibisforest.org/index.php?%E3%83%A1%E3%82%BF%E5%AD%A6%E7%BF%92
• [DL ]Meta-Learning Probabilistic Inference for Prediction ( )
• https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for-
prediction-126167192
4
MAML
MAML (Model Agnostic Meta-Learning) [Finn+ 2017]
•
• adapt
• MAML
• 2
• 1 [Nichol+2018]
• meta-test 

5
min
θ ∑
𝒯
ℒ (θ − α∇θℒ (θ, 𝒟tr
𝒯), 𝒟val
𝒯 ) = min
θ ∑
𝒯
ℒ (ϕ 𝒯, 𝒟val
𝒯 )
θ ϕ 𝒯
ϕ 𝒯test
= θ − α∇θℒ (θ, 𝒟tr
𝒯test)
MAML
• loss loss( )
• MAML model-based [Nagabandi+ 2018] [Gupta+ 2018]
• [DL ]Meta Reinforcement Learning ( )
• https://www.slideshare.net/DeepLearningJP2016/dl-130067084
6
ℒRL (ϕ, 𝒟 𝒯i) = −
1
𝒟 𝒯i
∑
st,at∈𝒟
ri (st, at)
= − 𝔼st,at∼πϕ,q 𝒯i [
1
H
H
∑
t=1
ri (st, at)
]
( ) On-policy v.s. Off-policy
On-policy ( )
• ( )
•
• ) ε-greedy
Off-policy ( )
•
•
※ MAML train
test (= off-policy )
7
Efficient Off-Policy Meta-Reinforcement

Learning via Probabilistic Context Variables
8
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context
Variables
• https://arxiv.org/abs/1903.08254 (Submitted on 19 Mar 2019)
• Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine
• UC Berkeley (BAIR)
• Deep RL ”UC Berkeley”
•
• https://github.com/katerakelly/oyster
• (BAIR )PyTorch rlkit
9
TL; DR
• meta learning off-policy (PEARL)
• (context)
• permutation invariant
• 20-100
10
( MAML )
• meta-training adaptation on-policy
• MAML meta-train meta-test off-policy
• adapt
•
11
12
• off-policy RL (soft actor-critic, SAC [Haarnoja+ 2018]) 

context (PEARL)
• Meta-training adapt
• meta-train policy
context
• meta-test context policy
adapt
• policy off-policy meta-train meta-
test on-policy
13
MDP
•
•
•
• :
• :
• 1
•
•
14
p(𝒯)
𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)}
𝒯 c 𝒯
n = (sn, an, rn, s′n)
c = c 𝒯
1:N
p(𝒯)
context
• adapt
•
• (Inference network)
•
• prior Gaussian
• meta-train meta-test
15
z
z
qϕ(z|c)
𝔼 𝒯
[
𝔼z∼qϕ(z|c 𝒯
) [
R(𝒯, z) + βDKL (qϕ (z|c 𝒯
) ∥p(z))]]
p(z)
qϕ(z|c) ϕ zz
context
• MDP
• permutation invariant
• Inference network
• Gaussian
16
{si, ai, s′i, ri}
qϕ (z|c1:N) ∝ ΠN
n=1Ψϕ (z|cn)
Ψϕ (z|cn) = 𝒩 (fμ
ϕ (cn), fσ
ϕ (cn))
off-policy
• policy 

• actor ciritic 

• 

• on-policy 

on-policy test
17
qϕ(z|c)
ℬ
𝒮c
off-policy
• Soft Actor-Critic (SAC) [Haarnoja+ 2018] context
• SAC maxEntRL( ) off-policy actor-critic
• actor critic reparameterization trick
• critic loss: 

• actor loss:
18
ℒcritic = 𝔼(s, a, r, s′
) ∼ ℬ
z ∼ qϕ(z|c)
[Qθ(s, a, z) − (r + V (s′, z))]
2
z
ℒactor = 𝔼s∼ℬ,a∼πθ
DKL
(
πθ(a|s, z)∥
exp (Qθ(s, a, z))
𝒵θ(s) )
19
• MuJoCo 6
• Half-Cheetah, Humanoid, Ant, Walker (Half-Cheetah Ant 2 )
•
• adapt
• 20-100 

• : meta-training
• :
20
• on-policy (MAESN[Gupta+ 2018])
• sparse navigation
• meta-test 

• 

• context
• MAESN
21
Ablation Study
•
• Half-Cheetah-Vel
• RNN
• RNN-tran: de-correlated
• RNN-traj:
• permutation invariant 

22
Ablation Study
•
• Half-Cheetah-Vel
•
• off-policy: off-policy( )
• off-policy RL-batch: policy
• 

(PEARL)
23
Ablation Study
• context
• sparse navigation
• context
• 

24
25
• off-policy (PEARL)
• context policy context
off-policy
• meta-training
26
Guided Meta-Policy Search
27
Guided Meta-Policy Search
• https://arxiv.org/abs/1904.00956 (Submitted on 1 Apr 2019)
• Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea
Finn
• UC Berkeley (BAIR)
• …
•
• https://github.com/RussellM2020/GMPS
• Website
• https://sites.google.com/berkeley.edu/guided-metapolicy-search
28
TL; DR
• meta learning off-policy (GMPS)
• meta-train RL
• meta-train meta-objective( ) imitation learning (behaviour cloning)
• meta-training task learning meta-learning 2
29
( MAML )
• meta-training adaptation on-policy
• [Rakelly+ 2019]
• meta-training meta-test
30
31
• meta-train meta-objective( ) (behaviour cloning)
• meta-training 2
• task learning: meta-training policy
• policy meta-test expert
• meta-learning: policy meta-level supervised
32
[Rakelly+ 2019]
•
•
•
33
p(𝒯)
𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)}
p(𝒯)
task learning
• meta-training 

/ policy
•
meta-learning
• MAML
• adapt
• MAML
• (behaviour cloning)
34
𝒯i
{π*i
}
ℒRL (ϕi, 𝒟i)
ϕi 𝒯i
ℒBC (ϕi, 𝒟i) ≜ −
∑
(st,at)∈𝒟
log πϕ (at |st)
meta-learning
• meta-training 



• policy 

meta-objective
• 



behaviour cloning compounding error 

35
𝒯i
π*i
D*i
min
θ ∑
𝒯i
∑
𝒟val
i ∼𝒟*i
𝔼 𝒟tr
i ∼πθ [
ℒBC (θ − α∇θℒRL (θ, 𝒟tr
i ), 𝒟val
i )]
θ
𝒯i
ϕi
D*i
• meta-learning task learning meta-learning
• policy
•
• meta-training
• ) reward shaping
• MAML
36
policy
• policy 

contextual policy
• ( ID )
• meta-training
• meta-test meta-training
• soft actor-critic(SAC) [Haarnoja+ 2018]
37
πθ (at |st, ω)
ω
• Behaviour cloning meta-objective 

• 

• 



•
• Behaviour cloning
38
θ
ϕi
πθ
ϕi = θ + α𝔼τ∼πθ
[
πθ(τ)
πθinit
(τ)
∇θlog πθ(τ)Ai(τ)
]
Ai
θ ← θ − β∇θℒBC (ϕi, 𝒟val
i )
39
•
• Pushing (full state)
•
•
• Pushing (vision)
•
• Door opening
•
•
• (Ant)
•
https://sites.google.com/berkeley.edu/guided-metapolicy-search 40
•
• meta-training task context( )
• SAC
• : meta-training :
41
•
• Door Opening Ant
•
• pushing
•
42
43
• off-policy (GMPS)
• meta-training task learning meta-learning 2
(behaviour cloning)
• meta-training
44
45
• 2
• one-step update adapt (BAIR )
• ) MAML[Finn+ 2017]
• adapt (DeepMind )
• ) Neural Processes[Garnelo+ 2018], GQN[Eslami+ 2018]
•
•
• [DL ]Meta-Learning Probabilistic Inference for Prediction
• https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for-
prediction-126167192
• pro-con
46
Appendix
47
References
[Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman,
Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C.
Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene
representation and rendering.” Science 360 (2018): 1204-1210. http://science.sciencemag.org/content/360/6394/1204
{Finn+ 2017] Chelsea Finn, Pieter Abbeel and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,”
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1126-1135, 2017. http://proceedings.mlr.press/v70/
finn17a.html
[Garnelo+ 2018] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola and Danilo J. Rezende, S.M. Ali Eslami and Yee Whye
Teh. “Neural Processes”. https://arxiv.org/abs/1807.01622.
[Gupta+ 2018] Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel and Sergey Levine. ”Meta-Reinforcement Learning of
Structured Exploration Strategies”. In Advances in Neural Information Processing Systems, 2018. https://nips.cc/Conferences/2018/
Schedule?showEvent=12658
[Haarnoja+ 2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep
Reinforcement Learning with a Stochastic Actor”. Proceedings of the 35th International Conference on Machine Learning, PMLR
80:1861-1870, 2018. http://proceedings.mlr.press/v80/haarnoja18b.html
[Mendonca+ 2019] Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Guided Meta-
Policy Search”. https://arxiv.org/abs/1904.00956
[Nagabandi+ 2018] Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine and Chelsea Finn.
“Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning”. https://arxiv.org/abs/1803.11347
[Nichol+2018] Alex Nichol, Joshua Achiam and John Schulman. “On First-Order Meta-Learning Algorithms”. https://arxiv.org/abs/1803.02999
[Rakelly+ 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn ands Sergey Levine. “Efficient Off-Policy Meta-Reinforcement
Learning via Probabilistic Context Variables”. https://arxiv.org/abs/1903.08254
48

More Related Content

What's hot

[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
Deep Learning JP
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
Ryo Iwaki
 
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
Jun Okumura
 
[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習
Deep Learning JP
 
報酬設計と逆強化学習
報酬設計と逆強化学習報酬設計と逆強化学習
報酬設計と逆強化学習
Yusuke Nakata
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
Deep Learning JP
 
[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について
Deep Learning JP
 
強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験
克海 納谷
 
A3C解説
A3C解説A3C解説
A3C解説
harmonylab
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜
Jun Okumura
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
WEBFARMER. ltd.
 
論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning
Shunta Nomura
 
Optimizer入門&最新動向
Optimizer入門&最新動向Optimizer入門&最新動向
Optimizer入門&最新動向
Motokawa Tetsuya
 
[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展
Deep Learning JP
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
Eiji Uchibe
 
強化学習 DQNからPPOまで
強化学習 DQNからPPOまで強化学習 DQNからPPOまで
強化学習 DQNからPPOまで
harmonylab
 
[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning
Deep Learning JP
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
Deep Learning JP
 
[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論
Deep Learning JP
 
[DL輪読会]World Models
[DL輪読会]World Models[DL輪読会]World Models
[DL輪読会]World Models
Deep Learning JP
 

What's hot (20)

[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
 
[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習
 
報酬設計と逆強化学習
報酬設計と逆強化学習報酬設計と逆強化学習
報酬設計と逆強化学習
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
 
[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について
 
強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験強化学習アルゴリズムPPOの解説と実験
強化学習アルゴリズムPPOの解説と実験
 
A3C解説
A3C解説A3C解説
A3C解説
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
 
論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning
 
Optimizer入門&最新動向
Optimizer入門&最新動向Optimizer入門&最新動向
Optimizer入門&最新動向
 
[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
強化学習 DQNからPPOまで
強化学習 DQNからPPOまで強化学習 DQNからPPOまで
強化学習 DQNからPPOまで
 
[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
 
[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論
 
[DL輪読会]World Models
[DL輪読会]World Models[DL輪読会]World Models
[DL輪読会]World Models
 

Similar to [DL輪読会] off-policyなメタ強化学習

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
Deep Learning JP
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
Kan Yuenyong
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
Necip Oguz Serbetci
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
Changhoon Jeong
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
Giulio Carducci
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Novartis Institutes for BioMedical Research
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
Andres Mendez-Vazquez
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Intuit Inc.
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
MOVING Project
 
Course outline
Course outlineCourse outline
Course outline
SumbalImran2
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
Rerngvit Yanggratoke
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Ingo Frommholz
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Marcel Schmitz
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
Christopher Wilson
 

Similar to [DL輪読会] off-policyなメタ強化学習 (20)

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 
Course outline
Course outlineCourse outline
Course outline
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
Wat is Learning Analytics en hoe kan het in het (hoger) onderwijs worden inge...
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 

More from Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

Recently uploaded

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 

Recently uploaded (20)

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 

[DL輪読会] off-policyなメタ強化学習

  • 1. 1 off-policy Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Guided Meta-Policy Search Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
  • 2. • off-policy arXiv • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [Rakelly+ 2019] (2019/3/19) • Guided Meta-Policy Search [Mendonca+ 2019] (2019/4/1) • MAML meta-training off-policy 2
  • 3. 3
  • 4.  (meta learning) • : Wiki http://ibisforest.org/index.php?%E3%83%A1%E3%82%BF%E5%AD%A6%E7%BF%92 • [DL ]Meta-Learning Probabilistic Inference for Prediction ( ) • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 4
  • 5. MAML MAML (Model Agnostic Meta-Learning) [Finn+ 2017] • • adapt • MAML • 2 • 1 [Nichol+2018] • meta-test 
 5 min θ ∑ 𝒯 ℒ (θ − α∇θℒ (θ, 𝒟tr 𝒯), 𝒟val 𝒯 ) = min θ ∑ 𝒯 ℒ (ϕ 𝒯, 𝒟val 𝒯 ) θ ϕ 𝒯 ϕ 𝒯test = θ − α∇θℒ (θ, 𝒟tr 𝒯test)
  • 6. MAML • loss loss( ) • MAML model-based [Nagabandi+ 2018] [Gupta+ 2018] • [DL ]Meta Reinforcement Learning ( ) • https://www.slideshare.net/DeepLearningJP2016/dl-130067084 6 ℒRL (ϕ, 𝒟 𝒯i) = − 1 𝒟 𝒯i ∑ st,at∈𝒟 ri (st, at) = − 𝔼st,at∼πϕ,q 𝒯i [ 1 H H ∑ t=1 ri (st, at) ]
  • 7. ( ) On-policy v.s. Off-policy On-policy ( ) • ( ) • • ) ε-greedy Off-policy ( ) • • ※ MAML train test (= off-policy ) 7
  • 8. Efficient Off-Policy Meta-Reinforcement
 Learning via Probabilistic Context Variables 8
  • 9. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables • https://arxiv.org/abs/1903.08254 (Submitted on 19 Mar 2019) • Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine • UC Berkeley (BAIR) • Deep RL ”UC Berkeley” • • https://github.com/katerakelly/oyster • (BAIR )PyTorch rlkit 9
  • 10. TL; DR • meta learning off-policy (PEARL) • (context) • permutation invariant • 20-100 10
  • 11. ( MAML ) • meta-training adaptation on-policy • MAML meta-train meta-test off-policy • adapt • 11
  • 12. 12
  • 13. • off-policy RL (soft actor-critic, SAC [Haarnoja+ 2018]) 
 context (PEARL) • Meta-training adapt • meta-train policy context • meta-test context policy adapt • policy off-policy meta-train meta- test on-policy 13
  • 14. MDP • • • • : • : • 1 • • 14 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} 𝒯 c 𝒯 n = (sn, an, rn, s′n) c = c 𝒯 1:N p(𝒯)
  • 15. context • adapt • • (Inference network) • • prior Gaussian • meta-train meta-test 15 z z qϕ(z|c) 𝔼 𝒯 [ 𝔼z∼qϕ(z|c 𝒯 ) [ R(𝒯, z) + βDKL (qϕ (z|c 𝒯 ) ∥p(z))]] p(z) qϕ(z|c) ϕ zz
  • 16. context • MDP • permutation invariant • Inference network • Gaussian 16 {si, ai, s′i, ri} qϕ (z|c1:N) ∝ ΠN n=1Ψϕ (z|cn) Ψϕ (z|cn) = 𝒩 (fμ ϕ (cn), fσ ϕ (cn))
  • 17. off-policy • policy 
 • actor ciritic 
 • 
 • on-policy 
 on-policy test 17 qϕ(z|c) ℬ 𝒮c
  • 18. off-policy • Soft Actor-Critic (SAC) [Haarnoja+ 2018] context • SAC maxEntRL( ) off-policy actor-critic • actor critic reparameterization trick • critic loss: 
 • actor loss: 18 ℒcritic = 𝔼(s, a, r, s′ ) ∼ ℬ z ∼ qϕ(z|c) [Qθ(s, a, z) − (r + V (s′, z))] 2 z ℒactor = 𝔼s∼ℬ,a∼πθ DKL ( πθ(a|s, z)∥ exp (Qθ(s, a, z)) 𝒵θ(s) )
  • 19. 19
  • 20. • MuJoCo 6 • Half-Cheetah, Humanoid, Ant, Walker (Half-Cheetah Ant 2 ) • • adapt • 20-100 
 • : meta-training • : 20
  • 21. • on-policy (MAESN[Gupta+ 2018]) • sparse navigation • meta-test 
 • 
 • context • MAESN 21
  • 22. Ablation Study • • Half-Cheetah-Vel • RNN • RNN-tran: de-correlated • RNN-traj: • permutation invariant 
 22
  • 23. Ablation Study • • Half-Cheetah-Vel • • off-policy: off-policy( ) • off-policy RL-batch: policy • 
 (PEARL) 23
  • 24. Ablation Study • context • sparse navigation • context • 
 24
  • 25. 25
  • 26. • off-policy (PEARL) • context policy context off-policy • meta-training 26
  • 28. Guided Meta-Policy Search • https://arxiv.org/abs/1904.00956 (Submitted on 1 Apr 2019) • Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn • UC Berkeley (BAIR) • … • • https://github.com/RussellM2020/GMPS • Website • https://sites.google.com/berkeley.edu/guided-metapolicy-search 28
  • 29. TL; DR • meta learning off-policy (GMPS) • meta-train RL • meta-train meta-objective( ) imitation learning (behaviour cloning) • meta-training task learning meta-learning 2 29
  • 30. ( MAML ) • meta-training adaptation on-policy • [Rakelly+ 2019] • meta-training meta-test 30
  • 31. 31
  • 32. • meta-train meta-objective( ) (behaviour cloning) • meta-training 2 • task learning: meta-training policy • policy meta-test expert • meta-learning: policy meta-level supervised 32
  • 33. [Rakelly+ 2019] • • • 33 p(𝒯) 𝒯 𝒯 = {p (s0), p (st+1 |st, at), r (st, at)} p(𝒯)
  • 34. task learning • meta-training 
 / policy • meta-learning • MAML • adapt • MAML • (behaviour cloning) 34 𝒯i {π*i } ℒRL (ϕi, 𝒟i) ϕi 𝒯i ℒBC (ϕi, 𝒟i) ≜ − ∑ (st,at)∈𝒟 log πϕ (at |st)
  • 35. meta-learning • meta-training 
 
 • policy 
 meta-objective • 
 
 behaviour cloning compounding error 
 35 𝒯i π*i D*i min θ ∑ 𝒯i ∑ 𝒟val i ∼𝒟*i 𝔼 𝒟tr i ∼πθ [ ℒBC (θ − α∇θℒRL (θ, 𝒟tr i ), 𝒟val i )] θ 𝒯i ϕi D*i
  • 36. • meta-learning task learning meta-learning • policy • • meta-training • ) reward shaping • MAML 36
  • 37. policy • policy 
 contextual policy • ( ID ) • meta-training • meta-test meta-training • soft actor-critic(SAC) [Haarnoja+ 2018] 37 πθ (at |st, ω) ω
  • 38. • Behaviour cloning meta-objective 
 • 
 • 
 
 • • Behaviour cloning 38 θ ϕi πθ ϕi = θ + α𝔼τ∼πθ [ πθ(τ) πθinit (τ) ∇θlog πθ(τ)Ai(τ) ] Ai θ ← θ − β∇θℒBC (ϕi, 𝒟val i )
  • 39. 39
  • 40. • • Pushing (full state) • • • Pushing (vision) • • Door opening • • • (Ant) • https://sites.google.com/berkeley.edu/guided-metapolicy-search 40
  • 41. • • meta-training task context( ) • SAC • : meta-training : 41
  • 42. • • Door Opening Ant • • pushing • 42
  • 43. 43
  • 44. • off-policy (GMPS) • meta-training task learning meta-learning 2 (behaviour cloning) • meta-training 44
  • 45. 45
  • 46. • 2 • one-step update adapt (BAIR ) • ) MAML[Finn+ 2017] • adapt (DeepMind ) • ) Neural Processes[Garnelo+ 2018], GQN[Eslami+ 2018] • • • [DL ]Meta-Learning Probabilistic Inference for Prediction • https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 • pro-con 46
  • 48. References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http://science.sciencemag.org/content/360/6394/1204 {Finn+ 2017] Chelsea Finn, Pieter Abbeel and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1126-1135, 2017. http://proceedings.mlr.press/v70/ finn17a.html [Garnelo+ 2018] Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola and Danilo J. Rezende, S.M. Ali Eslami and Yee Whye Teh. “Neural Processes”. https://arxiv.org/abs/1807.01622. [Gupta+ 2018] Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel and Sergey Levine. ”Meta-Reinforcement Learning of Structured Exploration Strategies”. In Advances in Neural Information Processing Systems, 2018. https://nips.cc/Conferences/2018/ Schedule?showEvent=12658 [Haarnoja+ 2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1861-1870, 2018. http://proceedings.mlr.press/v80/haarnoja18b.html [Mendonca+ 2019] Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Guided Meta- Policy Search”. https://arxiv.org/abs/1904.00956 [Nagabandi+ 2018] Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine and Chelsea Finn. “Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning”. https://arxiv.org/abs/1803.11347 [Nichol+2018] Alex Nichol, Joshua Achiam and John Schulman. “On First-Order Meta-Learning Algorithms”. https://arxiv.org/abs/1803.02999 [Rakelly+ 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn ands Sergey Levine. “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables”. https://arxiv.org/abs/1903.08254 48