Deep Sarsa, Deep Q-learning, DQN
Uijin Jung
To apply deep learning in reinforcement learning
predict
Label
Label
• Predict, label
• Loss function(cross-entropy, mse, etc..)
• Optimization
Sarsa
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼(𝑅 + 𝛾𝑄 𝑠′, 𝑎′
− 𝑄 𝑠, 𝑎 )
update Label
(bootstrapping)
predicted Q function
Wait.. Mean Squared Error??
State Q Value
• Create neural network that gets an state
as an input and returns an Q value as an output.
= 𝑅 + 𝛾𝑄 𝑠′
, 𝑎′
− 𝑄 𝑠, 𝑎
2
Deep Sarsa
• Loss funcion
State Q Value
Deep Q-Learning • Create neural network that gets an state
as an input and returns an value as an output.
= 𝑅 + 𝛾 ∙ 𝑚𝑎𝑥 𝑎′ 𝑄 𝑠′
, 𝑎′
− 𝑄 𝑠, 𝑎
2
• Loss funcion
Limit of the Deep Q-learning (and Deep Sarsa)
Worse than Q-learning
So Deepmind made DQN
DQN
• Deep Q-Learning with Experience replay and Target network
Experience Replay
• Stores millions of state tuples including reward and action in
Experience Replay memory.
• Choose the state tuples at random by batch size and train
• Prevent overfitting
Target network
• Bootstrapping cause bias.
• Copy neural network as a target network, and use it for generating
bootstrapping label.
• updating target network once in several times(This prevent bias by
restricting target not to move every step).
DQN performance
https://towardsdatascience.com/welcome-to-deep-
reinforcement-learning-part-1-dqn-c3cab4d41b6b
DQN pseudocode
https://towardsdatascience.com/introduction-to-various-
reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-ddpg-
72a5e0cb6287
The end.

Deep sarsa, Deep Q-learning, DQN