参考文献 I
[1] VolodymyrMnih et al. “Human-level control through deep reinforcement
learning”. In: Nature 518.7540 (2015), pp. 529–533.
[2] Volodymyr Mnih et al. “Playing Atari with Deep Reinforcement Learning”. In:
NIPS 2014 Deep Learning Workshop. 2013, pp. 1–9. arXiv:
arXiv:1312.5602v1.
[3] David Silver et al. “Deterministic Policy Gradient Algorithms”. In: ICML 2014.
2014, pp. 387–395.
[4] Richard S. Sutton et al. “Policy Gradient Methods for Reinforcement Learning
with Function Approximation”. In: In Advances in Neural Information
Processing Systems 12. 1999, pp. 1057–1063.
[5] Pawel Wawrzynski. “Real-time reinforcement learning by sequential
Actor-Critics and experience replay”. In: Neural Networks 22.10 (2009),
pp. 1484–1497.
[6] RJ Williams. “Simple statistical gradient-following algorithms for connectionist
reinforcement learning”. In: Reinforcement Learning 8.3-4 (1992), pp. 229–256.