Deep Deterministic Policy Gradient
DDPG
History
ML methods
ML methods
Supervised vs Unsupervised
Supervised process
Supervised uses
Unsupervised
Unsupervised
Neural network types
Gradient Descent
Reinforcement learning
Grid worlds
Value function vs Policy
Actor critic
Actor critic method
DDPG
- Continuous state and action space
- Replay buffer
- Soft updates
- Exploration noise
Pitfalls
- Designing reward function is very hard
- Tends to get stuck into local optima
- Unstable
- Needs lots of training samples
Driving in simulator

Deep deterministic policy gradient