SlideShare a Scribd company logo
Asynchronous Methods for Deep
Reinforcement Learning
B4 織田 智矢
1
論文情報
• タイトル
– Asynchronous Methods for Deep Reinforcement Learning
– URL : https://arxiv.org/abs/1602.01783
• 発表学会
– ICML2016
• 著者
– Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirz
• 所属
– Google DeepMind・Montreal Institute for Learning
Algorithms (MILA), University of Montreal
2
RL Map 3
https://qiita.com/sugulu/items/3c7d6cbe600d455e853b
Abstract
• 概念的にシンプルでライトウェイトな深層強化学習フレーム
ワークを提案する.
• 4つの標準強化学習(SARSA, 1-step Q, n-step Q, Advantage
Actor-Critic)アルゴリズムを非同期にして試したが,並列なア
クターがトレーニングに対して安定化効果を持ち,4つの方法
すべてが,NNコントローラーを上手く訓練できた.
• 一番の手法である非同期のactor-criticは現在のAtariドメインの
SOTAを凌駕しながら,GPUの代わりにマルチコアCPUで半分
の時間でトレーニングする.
• さらに行動空間が連続な,モーター制御問題も成功することを
示した.
4
RL Map 5
https://qiita.com/sugulu/items/3c7d6cbe600d455e853b
• DQN+Advantage+actor-critic+非同期分散処理の流れ
• 基本的なアーキテクチャはGorilaの基づいたもの
• A3CはAsynchronous Advantage Actor-Criticの略称
①
②
③ ④
DQNおさらい 6
• Q(s,a)が最大となるaを行い報酬rを環境から受け取る (s:state, a:action, r:reward)
• Replay Memoryに(s,a,r,s’)を入れる (s’:next state)
• MemoryからTarget Q Network with DQN Lossを計算しQ Networkを更新
• N update毎にTarget Q NetworkへQ Networkをコピー
Massively Parallel Methods for Deep Reinforcement Learning
https://arxiv.org/pdf/1507.04296.pdf
①“Vanilla” Policy Gradient
• 方策勾配法の元祖(どれが元祖かは不明)
• 行動の方策𝜋 𝜃 (各行動の確率) に対して期待収益 𝐽(𝜃)を最大化
• 広大な行動空間に対して価値化関数Qを設計するのは難しい
• そこで実際に得られた報酬rの合計で価値関数を近似すると
• このままだと分散が大きくなるのである関数b(s)を引く(ベースライン除
去というテクニック, 期待値をそのままに分散を小さくする目的)
• b(s)は報酬との2乗和誤差が最小になるように毎step調整
7
(価値関数 ∶ 𝑄 𝜋 𝜃)
②Actor-critic
• Value-Basedな手法(Q学習など)と
Policy-Basedな手法(方策勾配法)の
組み合わせ
• 行動確率p = 𝜋(a|s)を出力する (Actor)
• 状態価値関数R=Q(s,a) (Critic)
• 方策(Policy)に基づきある状態sで,あ
る行動aを取ったときの状態を批判し
方策を学習,そして同時に行動価値関
数も学習する.
8
③Advantage Loss Function
• Q学習での例
– 1-StepQ学習のloss関数
• 𝐿𝑖 𝜃𝑖 = 𝐸(𝑟 + 𝛾 max
𝑎′
𝑄 𝑠′, 𝑎′; 𝜃𝑖−1 − 𝑄(𝑠, 𝑎; 𝜃𝑖)) 2
• Rewardが直接作用するのはこのときのs,aペアのみ
– n-StepQ学習loss関数
• 𝐿𝑖 𝜃𝑖 = 𝐸( 𝑘
𝑛
𝛾 𝑘 𝑟𝑡+𝑘 + 𝛾 𝑛max
𝑎′
𝑄 𝑠′, 𝑎′; 𝜃𝑖−1 − 𝑄(𝑠, 𝑎; 𝜃𝑖)) 2
• n-step先まで行動して,更新する(1回行動するたびに学習する
より早く収束する)
• Advantage-Actor-Criticのloss関数
– 𝐿𝑖 𝜃𝑖 = ∇ 𝜃 log 𝜋 𝑎 𝑡 𝑠𝑡; 𝜃′ 𝐴(𝑎 𝑡, 𝑠𝑡; 𝜃, 𝜃𝑣)
– Where 𝐴 𝑎 𝑡, 𝑠𝑡; 𝜃, 𝜃𝑣 = 𝑘=0
𝑛−1
𝛾 𝑘 𝑟𝑡+𝑘 + 𝛾 𝑛 𝑉 𝑠𝑡+𝑛; 𝜃𝑣 − 𝑉(𝑠𝑡; 𝜃𝑣)
9
A3C
• Policy Gradientからの変更点
– ある関数b(s)をcriticとしてニューラルネットで近似
• b(s)がそのまま状態価値関数になる
– 期待収益にエントロピー項追加
• 目的関数の正則化の意味で導入
– 分散非同期にする
• 学習の収束が早くなる
10
④ Gorila 11
https://arxiv.org/pdf/1507.04296.pdf
(動物のゴリラのスペルはgorilla)
• DQNを非同期分散処理で実装したもの
• Replay Memoryをすべてのスレッド(Actor)で共有している点が大きな特徴
Architecture 1 12
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
パラメータサーバから重
みをコピー
Parameter Server θ
Network
Architecture 2 13
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
メモリに経験を貯める
(tmax or Doneまで)
Parameter Server θ
Network
Architecture 3 14
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
MemoryからLossを計算し勾
配を求める
Network
Architecture 4 15
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
Parameter Server θ
Network
非同期に勾配をServerに渡して,
Serverのネットワークを更新
1に戻るをTmax繰り返す
Advantage actor-critic 16
• 𝜃𝑣は状態価値関数のニューラルネットワーク
• log 𝜋 𝑎 𝑠; 𝜃 (𝑅 𝑡 − 𝑉 𝑡 )を最大化するように更新すればいい(方策勾配定理 )
• 方策の学習も,状態価値関数の学習もAdvantage A=R-Vが使われている
• A = R-Vは行動価値から状態の価値を引いている→行動の確率×行動の価値が欲しい
1
4
2
3
Results 17
Conclusions
• 共有モデルを更新するために,並列なアクターを使用すること
が,学習プロセスに安定化効果を持つことである.
• Value-basedな手法はQ値の過大評価の偏りを減らす様々な方
法から恩恵を受けることができた.
• Replay Memoryを使用していないため,LSTM等の時系列モデ
ルが使用可能.
18
A2C
• A3Cが誕生した当初は非常に有力な手法
• しかしこの非同期性がパフォーマンスの向上につながったのか
不明であった.
• 実際に非同期更新せずにすべてのスレッドが終わるまで待ち,
すべての平均を取って更新する手法を試した.(この手法は
GPUをより効率的に使用できる)
• 結果,A3Cよりパフォーマンスが優れていた.
• 非同期によって発生するノイズはパフォーマンスの向上
にならない.
• よってA3Cを使うならA2Cの方が費用対効果がよい.
19
https://openai.com/blog/baselines-acktr-a2c/

More Related Content

What's hot

Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
Keisuke Nakata
 
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
Deep Learning JP
 
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
Deep Learning JP
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
Deep Learning JP
 
Rainbow
RainbowRainbow
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
Deep Learning JP
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
MasanoriSuganuma
 
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Deep Learning JP
 
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
Deep Learning JP
 
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
Shunsuke Ono
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法について
MasanoriSuganuma
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular Properties
Kazuki Fujikawa
 
PRML6.4
PRML6.4PRML6.4
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
Deep Learning JP
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
Deep Learning JP
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
Sho Tatsuno
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
Yuta Koreeda
 
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
Yukiyoshi Sasao
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
Deep Learning JP
 
K means tracker study
K means tracker studyK means tracker study
K means tracker study
Hoang Hai
 

What's hot (20)

Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement ...
 
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
[DL輪読会] Learning Finite State Representations of Recurrent Policy Networks (I...
 
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
[DL輪読会]Differentiable Mapping Networks: Learning Structured Map Representatio...
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
 
Rainbow
RainbowRainbow
Rainbow
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
 
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
[DL輪読会]Parallel WaveNet: Fast High-Fidelity Speech Synthesis
 
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
 
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
夏のトップカンファレンス論文読み会 / Realtime Multi-Person 2D Pose Estimation using Part Affin...
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法について
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular Properties
 
PRML6.4
PRML6.4PRML6.4
PRML6.4
 
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
 
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
 
K means tracker study
K means tracker studyK means tracker study
K means tracker study
 

Similar to 北大調和系 DLゼミ A3C

[DL輪読会]High-Fidelity Image Generation with Fewer Labels
[DL輪読会]High-Fidelity Image Generation with Fewer Labels[DL輪読会]High-Fidelity Image Generation with Fewer Labels
[DL輪読会]High-Fidelity Image Generation with Fewer Labels
Deep Learning JP
 
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
SusumuOTA
 
Paper intoduction "Playing Atari with deep reinforcement learning"
Paper intoduction   "Playing Atari with deep reinforcement learning"Paper intoduction   "Playing Atari with deep reinforcement learning"
Paper intoduction "Playing Atari with deep reinforcement learning"
Hiroshi Tsukahara
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
Deep Learning JP
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
Deep Learning JP
 
1017 論文紹介第四回
1017 論文紹介第四回1017 論文紹介第四回
1017 論文紹介第四回
Kohei Wakamatsu
 
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
Preferred Networks
 
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
Deep Learning JP
 
Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course) Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course)
hirokazutanaka
 
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Toru Fujino
 
Top-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender SystemTop-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender System
harmonylab
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
Yasunori Ozaki
 
DeepLoco
DeepLocoDeepLoco
DeepLoco
harmonylab
 
Feature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learningFeature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learning
harmonylab
 
論文 Solo Advent Calendar
論文 Solo Advent Calendar論文 Solo Advent Calendar
論文 Solo Advent Calendar
諒介 荒木
 
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
Deep Learning JP
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
Kenta Oono
 
[DL輪読会]Training RNNs as Fast as CNNs
[DL輪読会]Training RNNs as Fast as CNNs[DL輪読会]Training RNNs as Fast as CNNs
[DL輪読会]Training RNNs as Fast as CNNs
Deep Learning JP
 
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
Eiji Uchibe
 
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
Deep Learning JP
 

Similar to 北大調和系 DLゼミ A3C (20)

[DL輪読会]High-Fidelity Image Generation with Fewer Labels
[DL輪読会]High-Fidelity Image Generation with Fewer Labels[DL輪読会]High-Fidelity Image Generation with Fewer Labels
[DL輪読会]High-Fidelity Image Generation with Fewer Labels
 
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
分散型強化学習手法の最近の動向と分散計算フレームワークRayによる実装の試み
 
Paper intoduction "Playing Atari with deep reinforcement learning"
Paper intoduction   "Playing Atari with deep reinforcement learning"Paper intoduction   "Playing Atari with deep reinforcement learning"
Paper intoduction "Playing Atari with deep reinforcement learning"
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
1017 論文紹介第四回
1017 論文紹介第四回1017 論文紹介第四回
1017 論文紹介第四回
 
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
 
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
 
Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course) Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course)
 
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
 
Top-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender SystemTop-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender System
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
DeepLoco
DeepLocoDeepLoco
DeepLoco
 
Feature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learningFeature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learning
 
論文 Solo Advent Calendar
論文 Solo Advent Calendar論文 Solo Advent Calendar
論文 Solo Advent Calendar
 
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
 
[DL輪読会]Training RNNs as Fast as CNNs
[DL輪読会]Training RNNs as Fast as CNNs[DL輪読会]Training RNNs as Fast as CNNs
[DL輪読会]Training RNNs as Fast as CNNs
 
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
NIPS KANSAI Reading Group #7: Temporal Difference Models: Model-Free Deep RL ...
 
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
【DL輪読会】Learning Instance-Specific Adaptation for Cross-Domain Segmentation (E...
 

Recently uploaded

論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
Toru Tamaki
 
Generating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language ModelsGenerating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language Models
harmonylab
 
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライドHumanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
tazaki1
 
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
Osaka University
 
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMMハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
osamut
 
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
嶋 是一 (Yoshikazu SHIMA)
 
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobodyロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
azuma satoshi
 

Recently uploaded (7)

論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
 
Generating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language ModelsGenerating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language Models
 
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライドHumanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
 
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
 
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMMハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
 
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
 
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobodyロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
 

北大調和系 DLゼミ A3C