From paper
高速追従型2眼能動カメラシステム (High-Speed Object Tracking System using Active Camera )
Hiroshi OIKE, Haiyuan WU, Chunsheng HUA, and Toshikazu WADA
Transactions of the Institute of Systems, Control and Information Engineers 20(3), 114-121, 2007-03-15
From paper
高速追従型2眼能動カメラシステム (High-Speed Object Tracking System using Active Camera )
Hiroshi OIKE, Haiyuan WU, Chunsheng HUA, and Toshikazu WADA
Transactions of the Institute of Systems, Control and Information Engineers 20(3), 114-121, 2007-03-15
Computational Motor Control: Reinforcement Learning (JAIST summer course) hirokazutanaka
This is lecure 6 note for JAIST summer school on computational motor control (Hirokazu Tanaka & Hiroyuki Kambara). Lecture video: https://www.youtube.com/watch?v=GHMcx5F0_j8
2. 論文情報
• タイトル
– Asynchronous Methods for Deep Reinforcement Learning
– URL : https://arxiv.org/abs/1602.01783
• 発表学会
– ICML2016
• 著者
– Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirz
• 所属
– Google DeepMind・Montreal Institute for Learning
Algorithms (MILA), University of Montreal
2
12. Architecture 1 12
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
パラメータサーバから重
みをコピー
Parameter Server θ
Network
13. Architecture 2 13
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
メモリに経験を貯める
(tmax or Doneまで)
Parameter Server θ
Network
14. Architecture 3 14
Parameter Server θ
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
MemoryからLossを計算し勾
配を求める
Network
15. Architecture 4 15
thread 1
Environment Network
Gradients
Learner with A3C
Loss
Actor Memory
thread k
Environment Network
Gradients
Learner with A3C
Loss
Actor
Memory
Parameter Server θ
Network
非同期に勾配をServerに渡して,
Serverのネットワークを更新
1に戻るをTmax繰り返す