Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DQN (Deep Q-Network)

49 views

Published on

Reinforcement Learning

Published in: Technology
  • Be the first to comment

DQN (Deep Q-Network)

  1. 1. Deep Q-Network guodong
  2. 2. Value Iteration and Q-learning • Model-free control: iteratively optimise value function and policy •
  3. 3. Value Function Approximation • “Lookup table” is not practical • generalize to unobserved states • handle large state/action space (and continuous state/action) • Transform to supervised learning problem • model(hypothesis space) • Loss/cost function • optimization • iid assumption • RL is unstable/divergent when action-value Q function is approximated with a nonlinear function like neural networks • states are correlated & data distribution changes + complex model
  4. 4. Deep Q-Network • First step towards “General Artificial Intelligence” • DQN = Q-learning + Function Approximation + Deep Network • Stabilize training with experience replay and target network • End-to-end RL approach, and quite flexible
  5. 5. DQN Algorithm
  6. 6. Practical Tips • stable training: experiment replay(1M)+ fixed target • mini-batch • E&E with decremental epsilon greedy parameter (1.0 to 0.1) • input of Q-NETWORK includes 4 recent frames • skip frames • discounted reward with 0.99 • use RMSProp instead of SGD
  7. 7. DQN variants • Double DQN • Prioritized Experience Replay • Dueling Architecture • Asynchronous Methods • Continuous DQN
  8. 8. Double Q-learning • Motivation: reduce overestimation by decomposing the max operation in the target into action selection and action evaluation
  9. 9. Double DQN • From Double Q-learning to DDQN
  10. 10. Prioritized Experience Replay • Motivation: more frequently replay transitions with high information • Key components • criterion of importance: TD error • stochastic prioritization instead of greedy • Importance sampling to avoid bias
  11. 11. Algorithm
  12. 12. Performance compare
  13. 13. Dueling Architecture - Motivation • Motivation: for many states, estimation of state value is more important, comparing with state-action value • Better approximate state value, and leverage power of advantage function
  14. 14. Dueling Architecture - Details • Adopt to existing DQN algorithms (output of dueling network is still Q function) • Estimate value function and advantage function separately, and combine them to estimate action value function • In Back-propagation: the estimates value function and Advantage function are computed automatically
  15. 15. Dueling Architecture - Performance • Converge faster • More robust (differences between Q-values for a given state are small, so noise could make the nearly greedy policy switch abruptly) • Achieve better performance on Atari games (advantage grows when the number of actions is large)
  16. 16. More variants • Continuous action control + DQN • NAF: continuous variant of Q-learning algorithm • DDPG: Deep DPG • Asynchronous Methods + DQN • multiple agents in parallel + parameter server
  17. 17. Reference • Playing atari with deep reinforcement learning • Human-level control through deep reinforcement learning • Deep Reinforcement Learning with Double Q-learning • Prioritized Experience Replay • Dueling Network Architectures for Deep Reinforcement Learning • Asynchronous methods for deep reinforcement learning • Continuous control with deep reinforcement learning • Continuous Deep Q-Learning with Model-based Acceleration • Double Q learning • Deep Reinforcement Learning - An Overview

×