71. 強化学習
(例)格闘ゲームTaoFeng におけるキャラクター学習
Ralf Herbrich, Thore Graepel, Joaquin Quiñonero Candela Applied Games Group,Microsoft Research Cambridge
"Forza, Halo, Xbox Live The Magic of Research in Microsoft Products"
http://research.microsoft.com/en-us/projects/drivatar/ukstudentday.pptx Microsoft Research Playing Machines: Machine Learning Applications in Computer Games
http://research.microsoft.com/en-us/projects/mlgames2008/
Video Games and Artificial Intelligence
http://research.microsoft.com/en-us/projects/ijcaiigames/
115. Deep Q-Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves,
Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller (DeepMind Technologies)
Playing Atari with Deep Reinforcement Learning
http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
画面を入力
操作はあらかじめ教える
スコアによる強化学習
119. 学習過程解析
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves,
Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller (DeepMind Technologies)
Playing Atari with Deep Reinforcement Learning
http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
120. • Pπ ロールアウトポリシー(ロールアウトで討つ手を決める。Pπ
(a|s) sという状態でaを討つ確率)
• Pσ Supervised Learning Network プロの討つ手からその手を討つ
確率を決める。Pσ(a|s)sという状態でaを討つ確率。
• Pρ 強化学習ネットワーク。Pρ(学習済み)に初期化。
• Vθ(s’) 局面の状態 S’ を見たときに、勝敗の確率を予測する関数。
つまり、勝つか、負けるかを返します。 Mastering the game of Go with deep neural networks and tree search
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html
https://deepmind.com/research/alphago/
121. Mastering the game of Go with deep neural networks and tree search
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html
https://deepmind.com/research/alphago/