WCCI 2008 Tutorial on Computational Intelligence and Games, part 2 of 3
1. CIG case study: car racing
• A prolonged example of applying CI to a
game: car racing
• Sensor representation and input selection
• Incremental evolution
• Competitive coevolution
• Player modelling
• Content creation
2. Racing games
• On the charts for the last three decades
• Can be technically simple (computationally
cheap) or very sophisticated
• Easy to pick up and play, but possess almost
unlimited “depth” (a lifetime to master)
• Can be played on your own or with others
3. CI in racing games
• Learning to race
• on your own, against specific opponents,
against opponents in general, on one or
several tracks, using simple or complex
cars/physics models, etc.
• Modelling driving styles
• Creating entertaining game content: tracks
and opponent drivers
4. A simple car game
• Optimised for speed, not for prettiness
• 2D dynamics (momentum, understeer, etc.)
• Intended to qualitatively replicate a
standard toy R/C car driven on a table
• Bang-bang control (9 possible commands)
5. • Walls are solid
• Waypoints must be
passed in order
• Fitness: continuous
approximation of
waypoints passed in
700 time steps
6. • Inputs
• Six range-finder sensors (evolvable pos.)
• Waypoint sensor, Speed, Bias
• Networks
• Standard MLP, 9:6:2
• Outputs interpreted as thrust/steering
7. T rack 10 50 100 200 P r.
1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01) 2.04 (0.02) 10
2 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2.15 (0.01) 10
3 3.25 (0.08) 3.4 (0.1) 3.45 (0.12) 3.57 (0.1) 10
4 3.35 (0.11) 3.58 (0.11) 3.61 (0.1) 3.67 (0.1) 10
5 2.66 (0.13) 2.84 (0.02) 2.88 (0.06) 2.88 (0.06) 10
6 2.64 (0) 2.71 (0.08) 2.72 (0.08) 2.82 (0.1) 10
7 1.53 (0.29) 1.84 (0.13) 1.88 (0.12) 1.9 (0.09) 10
T rack 18 2
0.59 (0.15) 3
0.73 (0.22) 4
0.85 (0.21) 5
0.93 (0.25) 06 7 8
Fitness (sd) 1.66 (0.12) 1.86 (0.02) 2.27 (0.45) 2.66 (0.3)
TABLE VI 2.19 (0.23) 2.47 (0.18) 0.22 (0.15) 0.15 (0.01)
TABLE V
F ITNESS OF BEST CONTROLLERS , EVOLVING CONTROLLERS
F ITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS . C OMPOUND FITNESS
SPECIALISED FOR EACH TRACK , STARTING FROM A FURTHER EVOLVED
2.22 (0.09).
GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS .
Fig. 6. Sensor setup of a controller
reach good fitness on, track 7. Presum
their angular spread reflects the larg
T rack 10 50 100 200 P r.
T rack 10 50 100 200 Phas to handle in order to navigate th
r.
1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01)
1 (0.02)
2.040.32 (0.07) 100.54 (0.2) 0.7 (0.38) 0.81 (0.5) 2
2 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2 2.150.38 (0.24)
(0.01) 100.49 (0.38) 0.56 (0.36) 0.71 (0.5) 2
3 3.25 (0.08) 3.4 (0.1) 3.45 (0.12)
3 (0.1)
3.570.32 (0.09) 100.97 (0.5) 1.47 (0.63) 1.98 (0.66) 7
4 3.35 (0.11) 3.58 (0.11) 3.61 (0.1)
4 (0.1)
3.670.53 (0.17) 101.3 (0.48) 1.5 (0.54) 2.33 (0.59) 9
5 2.66 (0.13) 2.84 (0.02) 2.88 5
(0.06) 2.880.45 (0.08)
(0.06) 100.95 (0.6) 0.95 (0.58) 1.65 (0.45) 8
6 2.64 (0) 2.71 (0.08) 2.72 6
(0.08) 0.4 (0.08)
2.82 (0.1) 100.68 (0.27) 1.02 (0.74) 1.29 (0.76) 5
7 1.53 (0.29) 1.84 (0.13) 1.88 7
(0.12) 1.9 0.3 (0.07)
(0.09) 100.35 (0.05) 0.39 (0.09) 0.46 (0.13) 0
8 0.16 (0.02) 0.19 (0.03) 0.2 (0.01) 0.2 (0.01) 0
8 0.59 (0.15) 0.73 (0.22) 0.85 (0.21) 0.93 (0.25) 0
TABLE I
TABLE VI
T HE FITNESS OF THE BEST CONTROLLER OF VARIOUS GENERATIONS ON
F ITNESS OF BEST CONTROLLERS , EVOLVING CONTROLLERS TRACKS , AND NUMBER OF RUNS PRODUCING
THE DIFFERENT
SPECIALISED FOR EACH TRACK , STARTING FROMPROFICIENT CONTROLLERS . F ITNESS AVERAGED OVER 10 SEPARATE
A FURTHER EVOLVED
GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS . STANDARD DEVIATION BETWEEN PARENTHESES .
EVOLUTIONARY RUNS ;
Fig. 2. The initial sensor setup, which is kept throughout the evolutionary Fig. 6. 5.
track Sensor setup of a controller specialized for, and able to consistently
run for those runs where sensor parameters are not evolvable. Here, setup of controller specialized forreach good While on, track 7. Presumably the use of all but one sensor and
Fig. 5. Sensor the car more or
less retaining the two longest-range sensors from the further evolved fitness general
is seen in close-up moving upward-leftward. At this particular position, the
their angular spread reflects the large variety of different situations the car
front-right sensor returns a positive number very close to 0, as it detects on, it has added medium-range sensors in the front and
controller it is based a has to handle in order to navigate this more difficult track.
wall near the limit of its range; the front-left sensor returns a number close
back,The front, very short-range sensor to the left. number of waypoints in the track, 7. Sensor setup of another con
to 0.5, and the back sensor a slightly larger number. and a left and right passed, divided by the Fig.
sensors do not detect any walls at all and thus return 0. plus an intermediate term representing how far it is on its way in figure 6 seemingly using all i
one
to the next waypoint, calculated from the relative distances
between the car and the previous and next waypoint. A
range 200 pixels, as has three sensors pointing forward- fitness of 10 evolutionary runs were made, track
controllers. For each track, 1.0 thus means having completed one full
VII. O BSERVATIONS ON EV
left, forward-right and backward respectively. The two other within the alloted time. Waypoints can only be passed in the
where the initial population was seeded with the general
sensors, which point left and right, have reach 100; this is correct order, and a waypoint is counted as passed when the
illustrated in figure 2. controller and evolution of the car is within 30continue for waypoint. In It has previously been found
centre was allowed to pixels from the 200
10. Choose your inputs
(+their representation)
• Using third-person inputs (cartesian inputs)
seems not to work
• Either range-finders or waypoint sensor
can be taken away, but some fitness lost
• A little bit of noise is not a problem,
actually it’s desirable
• Adding extra inputs (while keeping core
inputs) can reduce evolvability drastically!
11. If you don’t know
your inputs...
• Memetic techniques (e.g. memetic ES) can
sort out useful from useless inputs
• Principle: evolve neural network weights
together with a mask: whether connections
are on or off
• Masks and weights are evolved at
different time scales; after every mask
mutation, weight space is searched - if no
fitness increase, the mask is reverted
14. Generalization and
specialization
• A controller evolved for one track does
not necessarily perform well on other
tracks
• How do we achieve more general game-
playing skills?
• Is there a tradeoff between generality and
performance?
15. damaging such cars in collisions is ha
weight.
The dynamics of the car are based on
mechanical model, taking into account
car and bad grip on the surface, but is n
measurement [13][14]. The model is s
[4], and differs mainly in its improve
after more experience with the physical
response system was reimplemented to
realistic (and, as an effect, more undesir
may cause the car to get stuck if the
unfortunate angle, something often see
physical cars.
A track consists of a set of walls, a
and a set of starting positions and di
is added to a track in one of the sta
corresponding starting direction, both t
being subject to random alterations. Th
for fitness calculations.
For the experiments we have des
tracks, presented in figure 1. The tr
vary in difficulty, from easy to hard.
are versions of three other tracks wi
in reverse order, and the directions of
reversed.
The main differences between our
real R/C car racing problem have to
reported in Tanev et al. as well as [4]
not unimportant lag in the communica
computer and car, leading to the control
perceptions. Apart from that, there
Fig. 1. The eight tracks. Notice how tracks 1 and 2 (at the top), 3 and
4, 5 and 6 differ in the clockwise/anti-clockwise layout of waypoints and in estimations of the car’s position a
associated starting points. Tracks 7 and 8 have no relation to each other overhead camera. In contrast, the sim
16. Incremental evolution
• Introduced by Gomez & Mikkulainen (1997)
• Change the fitness function f (to make it
more demanding) as soon as a certain
fitness is achieved
• In this case, add new tracks to f as soon as
the controller can drive 1.5 rounds on all
tracks currently in f
18. • Controllers evolved for specific tracks
perform poorly on other tracks
• General controllers, that can drive almost
any track, can be incrementally evolved
• Starting from a general controller, a
controller can be further evolved for
specialization on a particular track
• drive faster than the general controller
• works even when evolution from scratch
did not work!
19. Two cars on a track
• Two car with solo-evolved controllers on
one track: disaster
• they don’t even see each other!
• How do we train controllers that take
other drivers into account? (avoiding
collisions or using them to their advantage)
• Solution: car sensors (rangefinders, like the
wall sensors) and competitive coevolution
21. Competitive
coevolution
• The fitness function evaluates at least two
individuals
• One individual’s success is adversely affected
by the other’s (directly or indirectly)
• Very potent, but seldom straightforward;
e.g. Hillis (1991), Rosin and Belew (1996)
22. Competitive
coevolution
• Standard 15+15 ES; each individual is
evaluated through testing against the
current best individual in the population
• Fitness function a mix of...
• Absolute fitness: progress in n time steps
• Relative fitness: distance ahead of or
behind the other car after n time steps
26. Problems with
coevolution
• Over-specialization and cycling
• Can be battled with e.g. archives
• Loss of gradient
• Can be battled with careful fitness
function design, e.g. combining absolute
and relative fitness
• Much more research needed here!
27. Multi-population
coevolution
• Typically, competitive coevolution uses one
or two populations
• Many more populations can be used!
• Can help against cycling and
overspecialization
• The phenotypical diversity between
populations can be useful in itself
29. Player modelling
• Can we create players that drive just like
specific human players?
• The models need to be...
• Similar in terms of performance
• Similar in terms playing (driving) style
• Robust
30. Direct modelling
• Let a player drive a number of tracks
• Use supervised learning to associate inputs
(sensors) with outputs (driving commands)
• e.g. MLP/Backpropagation or k-nearest
neighbour
• Suffers from generalization problems, and
that any approximation is likely to lead to
worse playing performance
31. Indirect modelling
• Let a human drive a test track, record
performance, speed and orthogonal
deviation at the various waypoints the track
• Start from a good, general evolved neural
network controller, and evolve it further
• Fitness: negative difference between
controller and player for the three
measures above
32. The test track supposedly requires
a varied repertoire of driving skills
1
0.8
0.6
Fitness (progess, speed)
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
0
0 10
Fig. 2. The test track and the car.
Fig. 3. Evolving a
First of all, we design a test track, featuring a number of
different types of racing challenges. The track, as pictured 0
in (fig), has two long straight sections where the player can −0.2
33. Content creation
• Creating interesting, enjoyable levels, worlds,
tracks, opponents etc.
• Not the same as well-playing opponents
• Probably the area where commercial game
developers need most help
• What makes game content fun? Many
theories, e.g. Thomas Malone, Raph Koster,
Mihály Csíkszentmihályi
34. Track evolution
• Using the controllers we evolved to model
human players, we evolve tracks that are fun
to drive for the modelled player
• Fitness function:
• Right amount of progress
• Variation in progress
• High maximum speed
35. Fig. 5. Track evolved using the random walk initialisation and mutation.
e the representa-
nted with several
t the beginning
plementations of
nfigurations are
rd initial track
rners. Each mu-
ontrol points by
distribution with
y axes.
xperiments, mu-
onfiguration, but
ectangle track is
eds of mutations
those mutations
controller is not
e result of such a
ll drivable track.
ck and evolution Fig. 6. A track evolved (using the radial method) to be fun for the first
author, who plays too many racing games anyway. It is not easy to drive,
which is just as it should be.
n, starts from an
rol points around
36. the results of ou
car racing [10].
In the section
describe a numb
value, most of wh
described here. D
sures would defin
urgent to study th
oft-cited hypothe
know there are n
entertainment me
games and types
needed.
Finally we no
different approach
in the beginning
Fig. 7. A track evolved (using the radial method) to be fun for the second
viewed from sev
author, who is a bit more careful in his driving. Note the absence of sharp on using evoluti
turns. in games is not
studying under w
perspective we h
37. ks by sampling
aken advantage
ack. First thick
side of the b-
ixels or subject
nt is set up. But
th of the track,
and sometimes
struction of the
ing the b-spline
middle of the
imately regular
e resulting track
est track which
e control points
Fig. 5. Track evolved using the random walk initialisation and mutation.
the representa-
ed with several
the beginning
38. but only sometimes causes the car to collide. Those elements are believed to
be the main source of final progress variability. These features are also notably
absent from track c, on which the good player model has very low variability.
The progress of the controller is instead limited by many broad curves.
Fig. 3. Three evolved tracks: ((a)) evolved for a bad player with target progress 1.1,
(b) evolved for a good player with target fitness 1.5, (c) evolved for a good player with
target progress 1.5 using only progress fitness.
41. More on these topics
• http://julian.togelius.com
• e.g. Togelius, Lucas and De Nardi:
“Computational Intelligence in Racing Games”
• Togelius, Gomez and Schmidhuber: “Learning
what to ignore” on Friday, 11.10, room 606
• Car Racing Competition on Tuesday 15.00,
room 402