Evolving Neural Network Agents In The Nero Video


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Evolving Neural Network Agents In The Nero Video

  1. 1. Evolving Neural Network Agents in the NERO Video Game Stelios Petrakis
  2. 2. The Paper “Evolving Neural Network Agents in the NERO Video Game” Kenneth O. Stanley / Bobby D. Bryant / Risto Miikkulainen Winner of ‘Computational Intelligence and Games’ 2005 Best Paper Award This paper introduces the real-time NeuroEvolution of Augmenting Topologies (rt-NEAT) method for evolving increasingly complex artificial neural networks in real time, as a game is being played.
  3. 3. Problem In most video games, scripts cannot learn or adapt to control the agents: Opponents will always make the same moves and the game quickly becomes boring. The rtNEAT method allows agents to change and improve during the game. In order to demonstrate this concept in the NeuroEvolving RoboticOperatives (NERO) game, the player trains a team of robots for combat.
  4. 4. Introduction for rtNEAT rtNEAT is a real-time enhancement of the NeuroEvolution of Augmenting Topologies method (NEAT; Stanley and Miikkulainen 2002b, 2004). NEAT evolves increasingly complex neural networks. Real-time NEAT (rtNEAT) is able to complexify neural networks as the game is played, making it possible for agents to evolve increasingly sophisticated behaviors in real time.
  5. 5. Properties that games A.I. should have Large state / action space Diverse behaviors Consistent individual behaviors Fast Adaptation Memory of past states
  6. 6. Learning techniques for NPCs in games Supervised techniques Not good. Agents need Backpropagation to learn online as game is played Decision Tree Learning Traditional Reinforcement Learning (RL) Q-Learning Not suitable due to the Sarsa(λ) constraints explained in previous slide Neuroevolution (NE) Artificial Evolution neural networks using a genetic algorithm Suitable! rtNEAT technique can be used
  7. 7. NEAT NEAT combines the usual search for the appropriate network weights with complexification of the network structure, allowing the behavior of evolved neural networks to become increasingly sophisticated over generations. It outperforms other neuroevolution (NE) methods NEAT was originally designed to run offline and is based on 3 key ideas
  8. 8. NEAT key ideas • Evolving network structure requires a flexible genetic encoding Connection Genes In-node Out-node Connection Weight Connection gene expression Innovation number • Individuals compete primarily within their own niches instead of with the population at large Protection of Topological innovations (explicit fitness sharing reproduction mechanism) • NEAT begins with a uniform population of simple networks with no hidden nodes
  9. 9. rtNEAT In NEAT the entire population is replaced at each generation. In rtNEAT only a single individual is replaced every few game ticks. One of the worst individuals is removed and replaced with a child of parents chosen from among the best. Challenge : Protection of innovation through speciation and complexification. Let fi be the fitness of individual i. Fitness sharing adjusts it to fi/|S|, where |S| is the number of individuals in the species. In other words, fitness is reduced proportionally to the size of the species.
  10. 10. rtNEAT Main Loop Remove the agent with the worst adjusted fitness from the population assuming one has been alive sufficiently long so that it has been properly evaluated. Re-estimate F for all species Choose a parent species to create the new offspring Adjust compatibility threshold Ct dynamically and reassign all agents to species Place the new agent in the world
  11. 11. Removing the worst agent The agent with the worst adjusted fitness should be removed, since adjusted fitness takes into account species size, so that new smaller species are not removed as soon as they appear. rtNEAT only removes agents who have played for more than the minimum amount of time m (not young ones).
  12. 12. Re-estimating F By removing the worst agent from a specie, species size has been decreased, therefore F has changed. We need to recalculate F for the next step; choosing parent species.
  13. 13. Choosing the parent species The average fitness NEAT of species k Fk Population size nk = P Ftot The sum of all the Number of offspring average species’ assigned to species k fitnesses rtNEAT The probability of choosing a given parent species is proportional to its average fitness. Fk Pr( S k ) = The expected number of offspring for each Ftot species is proportional to nk, preserving the speciation dynamics of original NEAT
  14. 14. Dynamic Compatibility Thresholding Ct determines whether an individual is compatible with a species’ representative. When there are too many species, Ct is increased to make species more inclusive. When there are too few, Ct is decreased to be stricter. An advantage of this kind of dynamic compatibility thresholding is that it keeps the number of species relatively stable. After changing Ct in rtNEAT, the entire population must be reassigned to the existing species based on the new Ct. In original NEAT, if a network does not belong in any species a new species is created with that network as its representative
  15. 15. Replacing the old agent with the new one Two options: Neural Network can be removed from the body and replaced without any changed in the body of the agent Agent dead, so body + neural network are replaced.
  16. 16. Determining Ticks Between Replacements If agents are replaced too frequently, they do not live long enough to reach the minimum time m to be evaluated. On the other hand, if agents are replaced too infrequently, evolution slows down to a pace that the player no longer enjoys. A law of eligibility can be formulated that specifies what fraction of the population can be expected to be ineligible once evolution reaches a steady state The ticks m between I= replacements Pn The minimum time alive The fraction of the population that is too young and therefore cannot be replaced
  17. 17. Let rtNEAT select n m m I = ⇒ n= Pn PI It is best to let the user choose I because in general it is most critical to performance; if too much of the population is ineligible at one time, the mating pool is not sufficiently large.
  18. 18. NERO
  19. 19. www.nerogame.org In NERO, the learning agents are simulated robots, and the goal is to train a team of robots for military combat. The robots begin the game with no skills and only the ability to learn.
  20. 20. Sensors Robots have several types of sensors.
  21. 21. DEMO / Trailer
  22. 22. References http://www.nerogame.org/ Evolving Neural Network Agents in the NERO Video Game (Stanley, Bryant, Miikkulainen 2005)
  23. 23. Thanks!