Ant Wars: XCS Strikes Back


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Ant Wars: XCS Strikes Back

    1. 1. Ant Wars: XCS Strikes Back Daniele Loiacono Artificial Intelligence and Robotics Laboratory, Politecnico di Milano, Italy GECCO'07, July 7-11, 2007, London, UK.
    2. 2. <ul><li>State is a 5x5 grid </li></ul>Ant Wars as RL Ant Environment state
    3. 3. Ant Wars as RL <ul><li>Ant moves on the grid in 8 </li></ul><ul><li>possible directions </li></ul>Ant Environment state action
    4. 4. Ant Wars and Reinforcement Learning <ul><li>Reward is </li></ul><ul><ul><li>1 if one piece of food is collected </li></ul></ul><ul><ul><li>10 if the opponent is killed, -10 if killed by the opponent </li></ul></ul><ul><ul><li>0 otherwise </li></ul></ul>Ant Environment state action ...delay… reward
    5. 5. Ant Wars as RL <ul><li>We modeled the Ant Wars as Markov Decision Process (MDP) </li></ul><ul><li>Actually Ant Wars is a Partially Observable MDP (POMDP) </li></ul><ul><li>In order to deal with POMDP we would need memory and a probabilistic approach </li></ul><ul><li>However for simplicity in this work we applied an MDP approach to Ant Wars, although we could not expect to find the optimal solution </li></ul>
    6. 6. The XCS Classifier System <ul><li>“ In state S doing action A will result in a payoff P, my estimate is as accurate as F” </li></ul>
    7. 7. XCS Ant <ul><li>We used XCS for learning a playing strategy for Ant Wars </li></ul><ul><li>State is encoded as a 48 bitstring, where each cell in the 5x5 grid (except the central one) is encoded with 2 bits: </li></ul><ul><ul><li>“ 00” if the cell is empty </li></ul></ul><ul><ul><li>“ 01” if the cell contains food </li></ul></ul><ul><ul><li>“ 10” if the cell is near enemy </li></ul></ul><ul><ul><li>“ 11” if the cell contains the opponent </li></ul></ul><ul><li>The final Ant strategy is represented by the set of rules in the final population evolved by XCS </li></ul>
    8. 8. Problem Decomposition <ul><li>What are the skills of a smart Ant? </li></ul><ul><ul><li>Avoid being killed </li></ul></ul><ul><ul><li>Kill the enemy (whenever possible) </li></ul></ul><ul><ul><li>Collect quickly the food </li></ul></ul><ul><li>Skill 1 and 2 are trivial to implement </li></ul><ul><li>Skill 3 is the more complex: a greedy strategy may fail to maximize the collected food in the long period </li></ul><ul><li>Can we learn Skill 3 ? </li></ul>
    9. 9. SmartieXCS Ant <ul><li>We applied XCS for learning only Skill 3 using as input a 24 bitstring, where each cell in the 5x5 grid (except the central one) is encoded as “1” if contains food, “0” otherwise </li></ul><ul><li>Accordingly a simplified version of Ant Wars without opponent is considered (the goal is now simply collecting food). </li></ul><ul><li>The evolved set of rules are finally combined with a human-designed playing strategy </li></ul>
    10. 10. Experimental Result <ul><li>We compared 4 ants: </li></ul><ul><ul><li>Random Ant – plays completely random </li></ul></ul><ul><ul><li>Smartie Ant – human designed </li></ul></ul><ul><ul><li>XCS Ant – completely evolved </li></ul></ul><ul><ul><li>SmartieXCS Ant – partially evolved </li></ul></ul><ul><li>As performance measure we considered: </li></ul><ul><ul><li>% of games won by the ants </li></ul></ul><ul><ul><li>% of games in which the ants have been killed </li></ul></ul><ul><ul><li>average points scored for game </li></ul></ul>
    11. 11. Experimental Results % games won % times killed avg score 1 2 3
    12. 12. Conclusions <ul><li>The bad </li></ul><ul><ul><li>Our approach is limited to MDP </li></ul></ul><ul><ul><li>Our ants were not able to win against the human designed one </li></ul></ul><ul><li>The good </li></ul><ul><ul><li>We put almost no human effort in the evolutionary process: input and output are encoded in a straightforward way without any preprocessing. </li></ul></ul><ul><ul><li>XCS Ant was able to learn autonomously the “smart skills” with quite good performance (especially avoid being killed) </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.