読 論文
John Schulman,Sergey Levine, Philipp Moritz, Michael I.
Jordan, Pieter Abbeel. Trust Region Policy Optimization.
ICML 2015.
▶ (深層)強化学習 話
▶ DQN(Deep Q-Network) [Mnih et al. 2015; Mnih et al.
2013] 異 ,価値関数 方策 NN 表現
最適化 (policy optimization)
参考文献 I
[1] ShamKakade and John Langford. “Approximately Optimal Approximate
Reinforcement Learning”. In: ICML 2002. 2002.
[2] Volodymyr Mnih et al. “Human-level control through deep reinforcement
learning”. In: Nature 518.7540 (2015), pp. 529–533.
[3] Volodymyr Mnih et al. “Playing Atari with Deep Reinforcement Learning”. In:
NIPS 2014 Deep Learning Workshop. 2013, pp. 1–9. arXiv:
arXiv:1312.5602v1.