27. Optimal Action Value Q*(s,a)
Q*(s,a) = maxπ
𝔼[R|sR|st
=s, at
=a, π]
Q(s,a;θ) ≈ Q*(s,a)θ) ≈ Q*(s,a)) ≈ Q*(s,a)
Minimize loss function Li
(θ) ≈ Q*(s,a))
Li
(θ) ≈ Q*(s,a)) = 𝔼s,a~p(∙)
[R|s(yi
-Q(s,a;θ) ≈ Q*(s,a)θ) ≈ Q*(s,a)))2
],
yi
= 𝔼s’~Ɛ
[R|sr + ϒmaxmaxa’
Q(s’,a’;θ) ≈ Q*(s,a)θ) ≈ Q*(s,a)i-1
)|s,a]
p(s, a) is probability over sequences s
and actions a
32. Book
Reinforcement Learning: An Introduction (Sutton & Barto)
http://incompleteideas.net/book/the-book-2nd.html
Courses
David Silver’s UCL Course on RL
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
Berkely CS 294: Deep Reinforcement Learning
rll.berkeley.edu/deeprlcourse/
Implementations
Denny Britz
https://github.com/dennybritz/reinforcement-learning
Article
Reinforcement Learning Doesn’t Work Yet
https://www.alexirpan.com/2018/02/14/rl-hard.html
Code
PyTorch Deep Q Learning
http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
Papers
Playing Atari with Deep Reinforcement Learning
https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning/
Human Level Control with RL
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf