This slide introduces the model which is one of the deep Q network. Dueling Network is the successor model of DQN or DDQN. You can easily understand the architecture of Dueling Network.
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1995-2003, 2016.
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1995-2003, 2016.
This document introduce the literature 'Connecting Generative Adversarial Networks and Actor-Critic Methods' written by D. Pfau, O. Vinyals. This is used in the event named 'The meeting where we discuss DRL model or else'.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
This document introduce the literature 'Deep Compression' written by S. Han, et al. You can easily understand that literature by reading this. Only Japanese.
This slide explains the deep learning model, DeepStereo. DeepStereo is proposed by J.Flynn, et al. This model solves the problem of new view synthesis.
2. 今回取り上げるのはこれ
[1] Z. Wang, et. al “Dueling Network Architectures for Deep
Reinforcement Learning.”
arXiv1511.06581. 2016.
Q値をV値と行動aに分離することにより性能を向上させ
た!
6. まず強化学習の基本から
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Vπ
s( )
7. the advantage functionを定義
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )the advantage function
Vπ
s( )
差をとってる
から を引いて とする Vπ
Qπ
Aπ