Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep Q-Network
guodong
Value Iteration and Q-learning
• Model-free control: iteratively optimise value function and policy
•
Value Function Approximation
• “Lookup table” is not practical
• generalize to unobserved states
• handle large state/acti...
Deep Q-Network
• First step towards “General Artificial Intelligence”
• DQN = Q-learning + Function Approximation + Deep Ne...
DQN Algorithm
Practical Tips
• stable training: experiment replay(1M)+ fixed target
• mini-batch
• E&E with decremental epsilon greedy pa...
DQN variants
• Double DQN
• Prioritized Experience Replay
• Dueling Architecture
• Asynchronous Methods
• Continuous DQN
Double Q-learning
• Motivation: reduce overestimation by decomposing the
max operation in the target into action selection...
Double DQN
• From Double Q-learning to DDQN
Prioritized Experience Replay
• Motivation: more frequently replay transitions
with high information
• Key components
• cr...
Algorithm
Performance compare
Dueling Architecture - Motivation
• Motivation: for many states, estimation of state value is more important,
comparing wi...
Dueling Architecture - Details
• Adopt to existing DQN algorithms (output of dueling
network is still Q function)
• Estima...
Dueling Architecture - Performance
• Converge faster
• More robust (differences
between Q-values for a
given state are sma...
More variants
• Continuous action control + DQN
• NAF: continuous variant of Q-learning algorithm
• DDPG: Deep DPG
• Async...
Reference
• Playing atari with deep reinforcement learning
• Human-level control through deep reinforcement learning
• Dee...
Upcoming SlideShare
Loading in …5
×

DQN (Deep Q-Network)

190 views

Published on

Reinforcement Learning

Published in: Technology
  • Be the first to comment

DQN (Deep Q-Network)

  1. 1. Deep Q-Network guodong
  2. 2. Value Iteration and Q-learning • Model-free control: iteratively optimise value function and policy •
  3. 3. Value Function Approximation • “Lookup table” is not practical • generalize to unobserved states • handle large state/action space (and continuous state/action) • Transform to supervised learning problem • model(hypothesis space) • Loss/cost function • optimization • iid assumption • RL is unstable/divergent when action-value Q function is approximated with a nonlinear function like neural networks • states are correlated & data distribution changes + complex model
  4. 4. Deep Q-Network • First step towards “General Artificial Intelligence” • DQN = Q-learning + Function Approximation + Deep Network • Stabilize training with experience replay and target network • End-to-end RL approach, and quite flexible
  5. 5. DQN Algorithm
  6. 6. Practical Tips • stable training: experiment replay(1M)+ fixed target • mini-batch • E&E with decremental epsilon greedy parameter (1.0 to 0.1) • input of Q-NETWORK includes 4 recent frames • skip frames • discounted reward with 0.99 • use RMSProp instead of SGD
  7. 7. DQN variants • Double DQN • Prioritized Experience Replay • Dueling Architecture • Asynchronous Methods • Continuous DQN
  8. 8. Double Q-learning • Motivation: reduce overestimation by decomposing the max operation in the target into action selection and action evaluation
  9. 9. Double DQN • From Double Q-learning to DDQN
  10. 10. Prioritized Experience Replay • Motivation: more frequently replay transitions with high information • Key components • criterion of importance: TD error • stochastic prioritization instead of greedy • Importance sampling to avoid bias
  11. 11. Algorithm
  12. 12. Performance compare
  13. 13. Dueling Architecture - Motivation • Motivation: for many states, estimation of state value is more important, comparing with state-action value • Better approximate state value, and leverage power of advantage function
  14. 14. Dueling Architecture - Details • Adopt to existing DQN algorithms (output of dueling network is still Q function) • Estimate value function and advantage function separately, and combine them to estimate action value function • In Back-propagation: the estimates value function and Advantage function are computed automatically
  15. 15. Dueling Architecture - Performance • Converge faster • More robust (differences between Q-values for a given state are small, so noise could make the nearly greedy policy switch abruptly) • Achieve better performance on Atari games (advantage grows when the number of actions is large)
  16. 16. More variants • Continuous action control + DQN • NAF: continuous variant of Q-learning algorithm • DDPG: Deep DPG • Asynchronous Methods + DQN • multiple agents in parallel + parameter server
  17. 17. Reference • Playing atari with deep reinforcement learning • Human-level control through deep reinforcement learning • Deep Reinforcement Learning with Double Q-learning • Prioritized Experience Replay • Dueling Network Architectures for Deep Reinforcement Learning • Asynchronous methods for deep reinforcement learning • Continuous control with deep reinforcement learning • Continuous Deep Q-Learning with Model-based Acceleration • Double Q learning • Deep Reinforcement Learning - An Overview

×