introduction to Dueling network

ディープラーニングの最新動向
強化学習とのコラボ編③　Dueling Network
2016/7/5
株式会社ウェブファーマー
大政　孝充

今回取り上げるのはこれ
[1] Z. Wang, et. al “Dueling Network Architectures for Deep
Reinforcement Learning.”
arXiv1511.06581. 2016.
Q値をV値と行動aに分離することにより性能を向上させ
た！

DQNやDDQNの解説は
DQNの解説に関しては私の[2]「ディープラーニングの最新動向　強化
学習とのコラボ編①　DQN」
http://www.slideshare.net/ssuser07aa33/introduction-to-deep-q-learning
DDQNの解説に関しては私の[3]「ディープラーニングの最新動向　強化
学習とのコラボ編②　DDQN」
http://www.slideshare.net/ssuser07aa33/introduction-to-double-deep-
qlearning
などを参考にして下さい

Dueling Networkの仕組み
[1]のFigure 1より
このへんが
特徴
DQN
Dueling
Network

DQNからDueling Networkまで
DQN
2013Nips
評価のQと選択
のQを分ける
DQN
2015Nature
DDQN
Prioritized
Replay
Qを時々コピー
学習用データを
選別？
Dualing
Networks
状態 s と行動 a の
advantageを分ける

まず強化学習の基本から
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Vπ
s( )

the advantage functionを定義
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )the advantage function
Vπ
s( )
差をとってる
　　から　　　を引いて　　　とする Vπ
Qπ
Aπ

the advantage functionとは
st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
それってどういうこと？
例えば状態　　からの行動　　に対する　　値がそれぞれ・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
・・・の時
st
at Q

st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
はざっくり・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
V Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦=
3+ 4+ 2
3
= 3
Vπ
s( )

st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
は・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
A Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )=
4−3=1!Aπ
s,a1( )
3−3= 0!Aπ
s,a2( )
2 −3= −1!Aπ
s,a3( )
⎧
⎨
⎪
⎪
⎩
⎪
⎪
となる
Aπ
s,a1
( )
Aπ
s,a3
( )
Aπ
s,a2
( )
Vπ
s( )

Dueling Networkのモデル
st
st+1
st+1
st+1
at
1
at
2
at
3
Vπ
Qπ
Aπ
ここで
ここで
両方足して
実際のモデルではこうなってる

実際の計算
Aの平均を０として足し合わせる
Q s,a;θ,α( )=V s;θ,β( )+ A s,a;θ,β( )−
1
Α
A s,a';θ,α( )
a'
∑
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
平均を引く
Q s,a;θ,α( )
V s;θ,β( )
A s,a;θ,β( )

introduction to Dueling network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

More from WEBFARMER. ltd.

More from WEBFARMER. ltd. (20)

introduction to Dueling network