Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mcts ai

1,018 views

Published on

Here we describe the mechanism of our sample fighting game AI using Monte Carlo Tree Search.

Published in: Technology
  • Be the first to comment

Mcts ai

  1. 1. MctsAi Team FightingICE March 27, 2016 http://www.ice.ci.ritsumei.ac.jp/~ftgaic/
  2. 2. Outline of MctsAi  A sample fighting game AI implementing UCB applied to trees (UCT) [1] for the FightingICE platform  A typical Monte-Carlo Tree Search (MCTS) algorithm [2] [1] Levente Kocsis and Csaba Szepesvari, “Bandit based Monte-Carlo Planning” [2] R Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search”
  3. 3. UCT  Repeat Selection→Expansion→Playout→Backpropagation until  Reaching the predefined maximum time-length or the maximum number of playouts  Use UCB1 value in Selection  Finally select the action associated with the adjacent child node, of the root node, having maximum number of visits selection expansion playout backpropagation
  4. 4. Upper Confidence Bound (UCB1) [3] 𝑈𝐶𝐵1(𝑖) = 𝑋𝑖 + 𝐶 2𝑙𝑛𝑁𝑖 𝑝 𝑁𝑖 𝑈𝐶𝐵1(𝑖): 𝑈𝐶𝐵1 value of node i 𝑋𝑖: Average evaluation value of node i 𝐶 : Balancing parameter (empirically set to 3 in the sample AI) 𝑁𝑖 𝑝 : Number of visits to the parent node of node 𝑖 𝑁𝑖: Number of visits to node 𝑖 Select a less visited node with a high evaluation value [3] P Auer and N Cesa-Bianchi and P Fischer, “Finite-time analysis of the multiarmed bandit problem”
  5. 5. MctsAi Procedure 1. Expand all adjacent child nodes at once from the root node 2. Repeat an iteration of Selection, Expansion, Playout, and Backprogation as many times as possible for 16.5ms (<-also empirically set) 3. Select an action to perform
  6. 6. 1 Expansion of all adjacent child nodes from the root node  Assign a very large random value to non-visited nodes as their initial UCB1 value 0 0 10002 NaN 0 NaN 0 100109999 NaN 0 ucb1value avg eval. value # of visits Node :
  7. 7. 2.1 Selection  Select nodes with highest UCB1 value all the way down to a leaf node 0 10002 NaN 0 NaN 0 100109999 NaN 0 17 4.42 0.3 3 2.5 10 4.764.07 0.5 4 NaN 0 10030 NaN 0 10028 NaN 0 10020 Example 1 Example 2
  8. 8. 2.2 Expansion  If a leaf node having 10 visits at the depth level of 1 is reached, then expand all of its child nodes at once 17 4.42 0.3 3 2.5 10 4.764.07 0.5 4 NaN 0 10030 NaN 0 10028 NaN 0 10020 17 4.42 0.3 3 2.5 10 4.764.07 0.5 4
  9. 9. 2.3 Playout 0 10002 NaN 0 NaN 0 100109999 NaN 0 17 4.42 0.3 3 2.5 10 4.764.07 0.5 4 NaN 0 10030 NaN 0 10028 NaN 0 10020  Perform a random simulation for 60 frames ahead Example 1 Example 2
  10. 10. 2.4 Backpropagation 17 4.42 0.3 3 2.5 10 4.764.07 0.5 4 NaN 0 10030 NaN 0 10028 NaN 0 10020  Backpropagate a newly obtained evaluation value and modify the UCB1 value and number of visits of all related nodes 18 4.46 0.3 3 2.27 11 4.444.10 0.5 4 0 1 6.57 NaN 0 10028 NaN 0 10020
  11. 11. 3 Selection of an action 0.33 3 4.64 0.33 3 4.64 2.66 6 5.71 56 4.14 2.53 28 1.95 22 3.763.81 0.33 6 0.5 2 5.98 2.2 5 5.66 4.09 11 6.43  Finally, select the action associated with the child node having the highest number of visits
  12. 12. That’s all folks!

×