Final Joined

i
A learning bot in first-person shooting game
Maung Aung Linn
Submitted for the Degree of
Masters of Science in Internet Computing
from the
Department of Computing
Faculty of Engineering and Physical Sciences
University of Surrey
Guildford, Surrey, GU2 7XH, UK
February 2010
Supervised by: Dr. Matthew Casey
© Maung Aung Linn 2010

ii
ACKNOWLEDGEMENT
First of all I would like to give my gratitude to my supervisor Dr Matthew Casey who is always
smiling, helping, and guiding me throughout my time of writing this dissertation. Without his
guidance and encouraging smiles, I would not have completed this dissertation myself.
Secondly, I would like to thank to my family for supporting in every aspects of my masters degree
program in one of the most expensive countries. I would also like to give special thanks to Sushant
for his encouragements, delicious meals, and letting me concentrate my thesis in his room overnights.
Finally, I would like to thank my friends for their encouragements, understandings, and supports.

iii
ABSTRACT
The computer gaming is blooming in these days. Even a hand-held mobile phone has games to play
during the leisure time. The games has been dramatically developed from simple cathode rays tubes to
sophisticated three dimensional environment games. Recent advancement of the dedicated graphic
hardware and artificial intelligence researches encourage more computation power and complex
artificial intelligent players to be developed along with the game. Researchers and game developers
are trying to develop the better and more challenging artificial intelligence players. There are many
artificial intelligent characters researches and developments based on reinforcement learning, artificial
neural networks, and evolutionary algorithms.
Learning is vital in evolving and creative processes in many life forms. Q-learning technique can be
used for an NPC to learn a control policy through the experiences in game. Q-learning is commonly
used in problems with states and actions. This dissertation proposed, implemented, and evaluated a
learning artificial intelligent agent, non-player character (NPC), which will try to learn and play the
three dimensional first-person shooting game with the Q-learning algorithm.

iv
TABLE OF CONTENTS
Acknowledgement ..................................................................................................................................ii
Abstract..................................................................................................................................................iii
Table of contents....................................................................................................................................iv
List of figures........................................................................................................................................vii
List of tables.........................................................................................................................................viii
Chapter 1: Introduction........................................................................................................................... 1
1.1 Background and context....................................................................................................... 1
1.2 Motivation............................................................................................................................ 2
1.3 Aim....................................................................................................................................... 2
1.4 Objectives............................................................................................................................. 2
1.5 Feasibility............................................................................................................................. 3
1.6 Structure of the dissertation ................................................................................................. 3
Chapter 2: Development of computer games and used of artificial intelligent agents............................ 4
2.1 Computer games and artificial intelligent agents................................................................. 4
2.2 Challenges and problems solved with AI............................................................................. 6
2.2.1 NPC movements and path-finding.......................................................................... 6
2.2.2 Decision making ..................................................................................................... 6
2.2.3 Learning.................................................................................................................. 7
2.3 Unreal Tournament 2004 ..................................................................................................... 7
2.4 Relevant first-person shooting games till date................................................................... 10
2.4.1 Wolfestien............................................................................................................. 10
2.4.2 Doom .................................................................................................................... 10
2.4.3 Duke Nukem 3D ................................................................................................... 11

v
2.4.4 Quake.................................................................................................................... 11
2.4.5 Quake II ................................................................................................................ 11
2.4.6 Unreal ................................................................................................................... 12
2.4.7 Half-life................................................................................................................. 12
2.4.8 Unreal Tournament............................................................................................... 13
2.4.9 Call of Duty (COD): Modern Warfare 2............................................................... 14
2.5 Relevant Techniques used for AI NPC .............................................................................. 15
2.5.1 Finite state machines............................................................................................. 15
2.5.2 Machine Learning................................................................................................. 16
2.5.3 Reinforcement Learning ....................................................................................... 16
2.5.4 Q-learning............................................................................................................. 17
2.5.5 Hierarchical reinforcement learning ..................................................................... 18
2.6 Related works..................................................................................................................... 19
2.7 Agent Architectures in related works................................................................................. 20
2.7.1 Action selection mechanism ................................................................................. 20
2.7.2 Behaviour trees ..................................................................................................... 21
2.8 AI implementation tools..................................................................................................... 22
2.8.1 JavaBots................................................................................................................ 23
2.8.2 Pogamut 2 ............................................................................................................. 24
2.8.3 Unified modelling language (UML)..................................................................... 24
2.8.4 UML state machine (UML statechart).................................................................. 27
2.9 Summary
Chapter 3: Method ................................................................................................................................ 29
3.1 Analysis of the game rules................................................................................................. 29

vi
3.2 Analysis of states and events.............................................................................................. 29
3.3 Analysis of available actions.............................................................................................. 34
3.4 Mapping state and actions.................................................................................................. 35
3.5 Behaviours ......................................................................................................................... 36
3.6 Implementation .................................................................................................................. 37
3.6.1 Proposed architecture............................................................................................ 37
3.6.2 Structure of the Q-learning bot ............................................................................. 39
3.7 Summary ............................................................................................................................ 42
Chapter 4: Evaluation ........................................................................................................................... 43
4.1 Unit Testing........................................................................................................................ 43
4.2 Integration Testing ............................................................................................................. 43
4.3 Experiment......................................................................................................................... 46
4.3.1 Experiment 1......................................................................................................... 47
4.3.2 Experiment 2......................................................................................................... 48
4.3.3 Experiment 3......................................................................................................... 49
4.4 Summary ............................................................................................................................ 49
Chapter 5: Conclusion and future works............................................................................................... 50
References ....................................................................................................................................... 51

vii
LIST OF FIGURES
Figure 2.1 Game play screen shot of the Unreal Tournament 2004 ....................................................... 9
Figure 2.2 Game play screen shot of the Unreal Tournament 2004 from first-person view................... 9
Figure 2.3 Wolfestein 3D in game screen shot ..................................................................................... 10
Figure 2.4 Doom in game screen shot................................................................................................... 10
Figure 2.5 Duke Nukem 3D in game screen shot ................................................................................. 11
Figure 2.6 Quake in game screen shot .................................................................................................. 11
Figure 2.7 Quake II in game screen shot .............................................................................................. 12
Figure 2.8 Unreal 1998 in game screen shot......................................................................................... 12
Figure 2.9 Half-life in game screen shot............................................................................................... 13
Figure 2.10 Unreal Tournament in game screen shot ........................................................................... 13
Figure 2.11 Modern warfare 2,
Screen shot taken when shooting enemy while driving the snowmobile....................... 14
Figure 2.12 A light bulb example for finite state machine ................................................................... 15
Figure 2.13 Simple FSM for NPC ........................................................................................................ 15
Figure 2.14 Standard reinforcement learning model............................................................................. 17
Figure 2.15 Bot and environment basic interaction diagram ................................................................ 20
Figure 2.16 Conceptual layers of bot behaviours.................................................................................. 20
Figure 2.17 A sample behaviour tree for a guard bot ........................................................................... 21
Figure 2.18 JavaBot structure diagram ................................................................................................. 23
Figure 2.19 Classification of UML diagram types................................................................................ 26
Figure 3.1 State-event diagram............................................................................................................. 30
Figure 3.2 Overview game’s states diagram......................................................................................... 31
Figure 3.3 Detail explorer state diagram............................................................................................... 32

viii
Figure 3.4 Detail engage state diagram................................................................................................. 32
Figure 3.5 Detail chase state diagram................................................................................................... 33
Figure 3.6 Detail retreat state diagram.................................................................................................. 33
Figure 3.7 An overview framework of an java bot on UT 2004 game server ...................................... 37
Figure 3.8 Flow-chart diagram for Q-learning bot................................................................................ 38
Figure 4.1 QBot trying to reach to the weapon..................................................................................... 44
Figure 4.2 QBot trying to shoot enemy at sight.................................................................................... 45
Figure 4.3 Qbot trying to reach items when low health........................................................................ 45
Figure 4.4 Qbot trying to retreat from enemy after low health............................................................. 45
LIST OF TABLES
Table 2.1 Q-learing mapping table, Q-table ......................................................................................... 18
Table 2.2 Standard notations for state machine diagrams..................................................................... 19
Table 3.1 State-action mapped table..................................................................................................... 35
Table 4.1 Experiment 1 setup table....................................................................................................... 47

- 1 -
CHAPTER 1
INTRODUCTION
Apart from information and communication processing, playing computer games is the one of the
most popular uses for computer technology. The computer game has dramatically developed from
text based adventures all the way to three dimensional graphical interfaces with complex and dynamic
environments. There are several uniquely identifiable systems put together to provide the specific
game environments. These systems include graphic rendering, special sound effects and background
music, user input, and AI. None of these systems alone can make a good game. However, all of these
systems synergized and working together that become a worthwhile computer game [1].
Nowadays, research and development of advance Artificial Intelligence (AI) are blooming in both
academic and commercial developers. Cheaper price for a better performance computer technologies
encourage to develop more sophisticated AI than the past few decades. People are also spending more
time in video games and even handheld devices such as mobile phones have video games to play. AI
came and evolved the long way from 1940s to recent days. In May 1997, IBM Deep Blue defeated the
human world chess champion, Kasparov [51]. However, achieving human likeliness AI is a long way
to go.
1.1 Background and context
The electronic interactive games are developed in 1950s. The Cathode Ray Tube Amusement Device
is first patented on January 25, 1947 by Thomas T. Goldsmith Jr. and Estle Ray Mann [54]. During
the time of 1950s-60s, most games are only able to be run on mainframe computers as the limited
performance of hardware. In later 1970s the video games have been evolve from Arcade game
machines to the personal computers and handheld devices such as mobile phones in these days. The
types of games are varied from text based and classic board games such as chess to the sophisticated
three dimensional graphic games such as Unreal Tournament 2004.
Nature of the games is to compete with each other players. The idea of the substituting the opponent
player with the computer player has come along with developing the games. Earliest AI in game was
developed in Atari games Qwak and Pursuit (1974). In early days of AI non-player characters (NPC)
developments, heuristic and brute force approaches are used. NPC are statically programmed to
perform the specific tasks and actions. Recently, there are techniques have been developed and solved
the challenges for AI such as path finding, movements, decision making, and learning.
The field of Artificial Intelligence (AI) research was born in 1956 at Dartmouth conference by various
scientists from different fields in mathematics, psychology, engineering, economics, and political
science [53]. The promising of machine performing intelligently became since after the first electronic

- 2 -
computer technology was developed in 1941 [53]. Nowadays, AI has been used in wide range of
various fields such as toys, computer games, machine controls, web search-engines, medical
diagnosis, and forecasting financial stock markets.
1.2 Motivation
People tend to lose interests in playing the static NPC opponent’s strategies and outcomes even
though those games have good graphic, sound quality, and background story of the game. Heuristic
and static coded AI NPC doesn’t have various outcomes. The actions are predictable for human
players after sometime of playing the game. There were some AI cheating such as AI characters are
much more powerful, more health points and knowing the exact location of the human players. Apart
from the amazing graphic details and game play, game developers are now trying to deliver the good
solo-gaming experience competing with the NPC. In order to achieve a worthwhile opponent, the
opponent should be dynamic which evolve and adapt from learning the other opponent’s strategies,
tactics, previous experiences, success, and failures. Dynamic environment are involved in new
generations of the games. The environment entities can be destructible; the path ways can be blocked
or destroyed. The NPC opponents need to adapt the dynamic environment and maximize their
performance.
1.3 Aim
The aim of this dissertation is to propose, implement, and train an artificial intelligent agent that can
learn through the first person shooting game plays and evaluate its effectiveness. The agent will be
developed in commercial first person shooting game environment Unreal Tournament 2004.
1.4 Objectives
The objectives of this dissertation are as follow –
1. To study existing game AI technologies
2. To study Pogamut2 platform
3. To implement an Non-player Character (NPC) which learns to play the first-person shooting
game
4. Train the NPC via game play experiences
5. Evaluate the effectiveness of the NPC with the sample bots

- 3 -
1.5 Feasibility
Since the AI agent is developed by using Java, and the Java being platform independent, the AI agent
can be implemented on any platform.
1.6 Structure of the dissertation
There are five main sections in this dissertation. The first section is introduction section Chapter 1
which briefly discussed about the AI and Games, motivation, aim, objectives, and feasibility.
The Chapter 2 of the dissertation is about the literature review that discuss about the computer games
and artificial intelligence involvements, First-Person Shooting game (FPS) in detail facts follow by
more about AI Agents learning in FPS games, architecture of AI agents in FPS games and AI Agents
learning developed by the researchers. The later section is theoretical information about
Reinforcement Learning, specifically Q-Learning algorithm as it is going to be part of the
development.
The Chapter 3 of the dissertation is the practical section conducting analysis, proposal, design, and
implementation of the java Q-learning bot. In this chapter analyzed on possible states, events, and
actions for a java bot in first-person shooting game, Unreal Tournament 2004 game.
The Chapter 4 of the dissertation is the evaluation of the learning AI agent with Q-learning algorithm.
In the chapter consists of testing and three experiments carried out for the developed java learning bot.
The Chapter 5 includes conclusion and future works of the dissertation that discussed about the
conclusion of developments and further works that will be done in the future.

- 4 -
CHAPTER 2
DEVELOPMENTS OF COMPUTER GAMES AND USED OF ARTIFICIAL
INTELLIGENT (AI) AGENTS
This chapter asserts the current mainstream games in the market, possible solutions on specific
challenges for artificial intelligent agents in these games, implementation tools, and relevant
technologies for AI Non-player Character (NPC).
2.1 Computer games and artificial intelligent agents (AI)
As the computer games are processor intensive, the graphic and audio systems usually emphasis
rather than complex artificial intelligence in early days. However, AI plays an important role in most
of the games. Player needs to compete with an opponent which is either another player or a computer
opponent, AI. Some players only notice how cool the game graphics look. However, if we remove
the artificial intelligence from the computer games, the game will be so simple and there will be no
challenges to the players. The games will be so easy to win as soon as we start the game. In game
characters will not be moving seamlessly and interactive.
There are challenges for developing effective AI in different types of games. Some of the brief
information about those game genres is as follow: (Some of the challenges and some possible
working solutions will be discussed.)
- First-person shooting game (FPSs)
In this type of game, players see the environment through the eyes of the represented
character and control the character. The typical rule of the most FPS games is Death Match
which objective is to score the game by killing opponents. Participants are spawned in
random preset location of the game world. Players have basic status of health points and
armour values which are ranged from 0% to 100%. Player has to pick up weapons, ammos,
and power boosts in order to compete with the opponents. There are other several different
game rules. In this genre, players’ opponents can be human players or AI. The objective of
the game is mostly relied on fast decision making, team works, and knowledge of hidden
power ups and special weapons in different maps. The reactive decision genetic behaviour AI
[2] and team works strategies such as “coordinated attack of squad bot” [3] are part of the AI
research interests. Some example of the game rules are last man standing, capture the flag,
and flag domination. Unreal Tournament 3 [14], Crysis [15], Far Cry 2 [16], Call of Duty 4
[17], Halo 3 [19] and Half-Life 2 [18] are some of the renowned games in FPS games.

- 5 -
- Third-person shooting game (TPSs)
A little different from the FPS games described above, players see the controlled character
from the perspective of viewing the character from the behind. Player can see the controlled
character and environment more freely than the FPS games. In some games, such as MDK 2
[7], yet for specific aiming and shooting, the game allows first-person perspective view to the
player.
- Strategy games
Unlike the first-person shooting games, players see the whole map or game world to see and
manage units or characters and buildings to control and achieve the objectives. The real time
strategy and turn based strategy games are two main types of strategy games. In this type of
game, AI involves controlling large numbers of units, long term planning, and managing
resources or spatial reasoning [5]. Some of the famous games in this genre are Command and
Conquer [8], Warcraft III [9], and Age of Empires III [10].
- Role playing games (RPG)
The game viewing perspective is either as FPS or TPS games. Player raises the character by
achieving goals and quests according to the game’s story line. Some related games of this
genre are Final Fantasy VII [11], the Witcher [13], and Resident Evil [12]. AI involvements
in this type of games are the non-player characters regarding of giving the quests and
performing as the opponents to the players.
- Massive multiplayer online games (MMORPG)
This game type is similar to the normal RPG games except players play online with other
massive amount of players. AI in this game involves as none-player characters (NPC) and
monsters for players to perform their quests and raise characters. Formal AI involved in these
games is environmental game world creatures and their reactions with the players. There are
several third-parties AI developed for some of this game genre is called Bot. Examples of this
game genre are World of Warcraft [33], EVE online [34], and AION [35]. An example of
third-parties Bot is SBOT [37] which is for Silkroad online game (SRO) [36]. The Bot take
control of the player’s character and perform the tasks for the player. As personal experience
in SRO, players have to kill millions of creatures repeatedly in order to raise the level of the
characters. However, such bots are illegal and unauthorized to use by the game providers. In
the game World of Warcraft, there are user interface add-ons to help the player to play the

- 6 -
game better. For instance as QuestHelper [55], it guides and intelligently suggests the player
to perform the nearest quests and monsters to kill at the current location. The ShockandAwe
[56] suggests the player which next move of action is the best to perform in current situation.
2.2 Challenges and problems solved with AI
Nowadays, AI contributes various aspects in modern computer games. Recently, AI techniques solved
most of the classis games such as Go-Muku [57] and the Nine Men’s Morris [58]. Another milestone
is IBM Deep Blue’s victory over Kasparov in year 1997. However in modern games, there are various
game aspects than the classic games. They involved in very limited time of reasoning, dynamic
environments to adapt, incomplete knowledge of the game world, and managing resources of the
game entities. Most generally discussed in game AI research area are unit movement, simulated
perception, situation analysis, spatial reasoning, learning, group coordination, resource allocation,
steering, flocking, target selection, and so many more [4]. Even some other animations and audio use
AI. Above of all mentioned problems, there are three main challenges that AI must provide in most of
the game. They are NPC movements, decision making, and learning.
2.2.1 NPC movements and path-finding
As we have seen the examples for the game genres in part 1.2, exempt from the turn-based strategy
games, other game genres fundamentally required the characters or units movements in either static or
dynamic changing virtual world. In real time strategy games and first/third person shooting games,
they have large numbers of entities such as NPCs, world inhabit living things, and vehicles.
Therefore, those entities required the correct and believable motions [6]. The game must have path for
those entities to move around the game world. Currently, the path finding or movements are achieved
by using a combination of scripting, Dijkstra’s shortest-path algorithm [20], A*-based grid search [21]
(an improved version of Dijkstra’s algorithm), local reactive methods and flocking. There are several
research based on the visual perspective generated movements (based on first-person camera view),
Geometric path planning, and navigation through polygonal worlds, in which they can apply on
robotic movements via video camera [21].
2.2.2 Decision making
In previous section 2.2.1, it is mentioned how to achieve the movements of the NPC. How will an
NPC knows where is the other player or to perform the objectives of the game? In order to achieve the
main objective of the game, an NPC needs to decide and pursue the sub objectives and goals. One of
challenges in developing AI is how to make an NPC act and react like a human. The game will not be
fun to play if an NPC knows everything whatever a player does. An NPC has to make decisions based
on limited knowledge of the game world. As an example in Unreal Tournament, an NPC has limited

- 7 -
sensitivity of viewing angle and sounds. An NPC must decide whether another player is hiding behind
the wall or pursuing from behind. There are several techniques to develop the complex human alike
reasoning for AI. Bayesian Network is one of the techniques to provide an NPC to perform human
like complex reasoning [23]. Fuzzy logic, Artificial Neural Networks, and Evolutional Algorithms are
also applied in NPC decision making.
2.2.3 Learning
NPC learning is vital in AI development as intelligent is part of the meaning of learning and evolving.
If NPC has been pre-scripted with various pre-defined logic and decision trees, the outcome will be
the same if there is no learning in the progress. The players will figure out the same patterns of the AI
and overcome with definite solutions whenever they face with the AI opponents. Hence, there will be
no more fun after figured out the same solution to compete with an NPC at the end of the day. There
are several machine learning algorithms, and those are organized based on their desired outcome.
Those are supervised learning [24], unsupervised learning [25], semi-supervised learning [26],
reinforcement learning [27], transduction [28][29], inductive bias learning [30], and Pareto-based
multi-objective learning [31][32].
2.3 Unreal Tournament 2004
First-person shooting games are popular in AI research because of popularity among consumers,
applicability as model for real-time decision making process, dynamic environments, and most
commercial games offers enough resources to develop the game AI and environments. Most of the
first-person shooting games have similar attributes. This section will emphasis on the Unreal
Tournament 2004, the FPS game and its environment.
Unreal Tournament 2004 is the successor of UT: GOTY [14] and Unreal Tournament 2003 [14]
developed by Epic games [42]. The game is developed based on the famous award winning game
engine, Unreal engine 2 which is used by more than 30 famous first-person shooting games. Unreal
Tournament 2004 game bundled with the Unreal level editor and AI script editor. Paper
[2][47][48][49] used Unreal Tournament 2004 as test-bed FPS game for AI research. Moreover, the
game can be run in any platform such as Mac, Linux, and Windows.
Similar to the FPS games, the player controls the virtual character in first-person perspective view in
the three-dimensional environment. Each character in the game has following attributes:
i. Health – Player has the health points (hit points) range from 0 to 100. Each character has 100
health points in the beginning of the game. If character consumes special health vial or special
health pack, it can have maximum of 199 health points. When health point reaches at 0, the
character dies.

- 8 -
ii. Armour – Player can equip the armour which will be added on the top of the character’s health.
The armour value is in range of 0 to 100. When the character is being hit, the health points will be
start deducting after the armour value is 0.
iii. Shield – Shield is a special rare item in UT2004 game that counted as armour.
iv. Weapon – There are 10 different weapons in UT 2004 varied from different attributes and
applications. When the character pickup a weapon, it will be added into character’s weapon list
with basic amount of ammunition.
v. Ammunition – Each weapon has own ammunition or ammo. Ammo values range from 0 to
various maximum amount depending on each weapon types. When the ammo values reach to 0,
character can’t shoot with the weapon any more.
vi. Adrenaline pills – Player can collect the adrenaline pills throughout the map. The value is 0 to
100. When the value reaches to 100, player can perform special abilities such as increasing
movement speed, invisibilities, and health regeneration via special combination keys.
Game environment is a 3D environment which has event triggered environments such as elevators
where character can move to different levels in the environment and portals which can teleport the
characters from one room to another room. The game maps provide the navigation graphs for the NPC
characters to move around the map. An A* algorithm [21] or its hierarchical variant is used to
navigate through the level by searching a path in this graph. The disadvantage of navigation graph is
that it does not provide any information when the NPC is not following the edges. Therefore, a ray-
casting sensor has to be used to detect obstacles, dangerous cliffs, etc.
The available actions that characters can do are basic movements such as walk forward, backward,
strait left, strait right, jump. Improved movements such as dashing right, dashing left, and double
jump which the character jump again in the middle of the air in order to reach some specific location.
There are basically two kinds of weapon shooting modes varied from one weapon to another and
some weapons has more than two shooting options. For example, a rocket launcher primary shooting
mode is to launch a missile. If the primary shooting mode is hold down the extra missiles are loaded
and makes rocket launcher available to launch two or more missiles at the same time. Secondary
shooting mode is to launch the missile bombs and same as the primary shooting mechanism holding
down the secondary shooting mode will allows two or more missile bombs to launch. In the following
page, Figure 1 and Figure 2 shows in game screen shots to view.

- 9 -
Figure 2.1, Game play screen shot of the Unreal Tournament 2004
Figure 2.2, Game plays screen shot of the Unreal Tournament 2004 from first-person
view

- 10 -
2.4 Relevant First-person Shooting Games Till Date
Several First-person shooting games have been developed before Unreal Tournament 2004 and there
are several more famous FPS games till date. The earliest first-person perspective games can be found
as Maze War (1973) [64].
2.4.1 Wolfestien
In 1993, the standard set for first-person shooter in 3D game was born by id Software, Wolfenstein
3D. The player can navigate through maze-like 3D environment and achieve the goal to escape from
the Nazi enemies within the maze. The enemy opponents are programmed statically as patrolling
within the maze and trying to shoot the player at sight. They chase after player however it’s limited
and used to lose track of the player. The opponent AI can open the doors but they normally try to
reach the enemy within the line of sight. They stop chasing player when the player is out of sight or
turn into the corner. [62]
Figure 2.3 Wolfenstein 3D in game screen shot
2.4.2 DOOM:
In 1994, the legendary first-person game, Doom, is created by id software. It has better graphic and
new flying creatures yet the AI opponent in game are not much improved than the earlier game
Wolfenstine. The game also set in maze like map and taking out the enemies within the maze. [62]
Figure 2.4 Doom in game screen shot

- 11 -
2.4.3 Duke Nukem 3D:
Duke Nukem 3D with a better graphical engine was released by 3D Realms in 1995. There they have
the ceilings and in some certain area, player can swim. However, the AI in this game hasn’t improved
much as the former games. [62]
Figure 2.5 Duke Nukem 3D in game screen shot
2.4.4 Quake:
In 1996, a FPS game, Quake, was developed by id Software. It has noticeably improved in graphic
and game play. Player can turn not only left and right but also up and down in first-person
perspective. The opponents and items are model as real 3D objects which earlier games used 2D flat
pictures. The AI opponents interact with player same as earlier games yet a bit improvement as they
started initiate the attacks when they hear the sound from the player. [62]
Figure 2.6 Quake in game screen shot
2.4.5 Quake II
Quake II is released by id Software by following year in late 1997. The game has improved not only
in term of game play graphic but also in AI opponents. The AI NPCs are able to evade the incoming
missiles and better in pursuing the player when player tries to run away. [62]

- 12 -
Figure 2.7 Quake II in game screen shot
2.4.6 Unreal
In 1998, a first person shooting game called Unreal is released by Epic. This is the game that first
bundled with the in game bots or artificial players. Such artificial players can be substitute as human
players in any game mode. Those bots are noticeable better than the previous games’ AI opponents.
They can navigate through the game environment with the help of way points in the game map. With
the way points, bots can even navigate and obtain the specific secret items and power ups in the game
level in order to perform better attacks and defences. [62]
Figure 2.8 Unreal 1998 in game screen shot
2.4.7 Half-life:
In 1997, Valve software released the game called Half-life. The game is based on the story of military
failed experiment and player has to fight against the military and the aliens from the result of fail
experiment. The game additionally brings the new ideas of the AI behaviours. For example, non-
player characters are at side of the player and help to perform the specific jobs. Scientists follow the
player through the level and help out the player to open the security doors, and the security guards
help the player to kill the enemies in sight. AI security guards do not care for the friendly forces in
line of sight when shooting enemies and occasionally harms the friendly players while they try to take

- 13 -
out the enemy. They also don’t care for their life, never abandon the fight, and try to attack till the
enemy dies or either parties defeated. [62]
Figure 2.9 Half-life in game screen shot
2.4.8 Unreal Tournament:
In 1999 the Unreal Tournament was released by Epic. The game is mostly for multiplayer game.
However, in solo game play, player has to defeat the bots through the tournament. Even in multiplayer
mode, opponent players can be substitute with the bots. Bots can perform well in the team play game
types such as Capture the Flag (CTF), Assault, and Domination. Same as the earlier game in Unreal,
the bots use the navigation way-points to get around the 3D game environment. [62]
Figure 2.10 Unreal Tournament in game Screen shot

- 14 -
2.4.9 Call of Duty (COD): Modern Warfare 2
In November 2009, The Call of Duty Modern Warfare 2 is released by Infinity Ward. The game is
available in Microsoft Windows, PlayStation 3, and Xbox 360. The game is composed with three
game-plays which are campaign, cooperative, and multiplayer. The game play consists of many
aspects such as shooting enemies while driving the snowmobile which can be seen in figure 2.11. In
campaign game-play mode which is usually known as solo-mode, the player takes the role of the
various characters and carries out the various game objectives of the game’s story. For example, the
player needs to arrive to particular check point, eliminate the specific enemy at the specific location,
and defends the location from the enemies taken over. There are several campaigns that AI NPCs
cooperate with the player and carry out the mission. In cooperative mode, two or more players can
join and carry out the cooperative missions. In multiplayer mode, maximum of 18 players can join
into the game environment as in two opposite teams (9 players versus 9 players) and achieve the goal
depending on the game modes. Game modes are vary from Free for All which players eliminate
opponents for game scores, team based game modes such as Search and Destroy, Demolition,
Domination, Team Deathmatch, and Capture the flag. The developers used the “Dynamic AI,” which
allows computer NPC opponents to act more independently. NPC opponents are actively seeking the
players, breaking away from the set of definite behaviours and carry out the unexpected tactics.
Therefore, the human players can’t predict the computer NPC as in definite position and attacks each
time the game is played.
Figure 2.11 Modern Warfare 2, screen shot taken when shooting enemy while driving
the snowmobile [65]

- 15 -
2.5 Relevant Techniques used for AI NPC
As we have seen in section 2.3, the aspect of the first-person shooter games revolutionize from time to
time. It changes from the formal point and shoots to complex actions such as driving a vehicle and
confronting with the enemies at the same time. According to the game play changes, the requirements
for the AI NPC are more complex. The computer NPC cannot be just patrolling around the specific
environment. They need to pursue their own goals, perform specific tasks, and cooperate with other
NPCs in the game. Traditionally, heuristic approaches such as state machines, rule based systems, and
AI scripts are used for AI NPC in most commercial games. However, those are static, limitation in
expandability, and time consumption for developing detail parameters and tactics. In the subsections
of this paper will conduct about the AI techniques which can be used to develop the AI NPC in
games.
2.5.1 Finite State Machines
A Finite State Machines (FSM) is a model of behaviour composed of finite number of states,
transitions between those states, and actions. As in real world example the state machine can be seen
as a light bulb which is either on or off. When the light is off, the state is “off state”, and when the
light is on, the state is “on state.”
Figure 2.12 A light bulb example for finite state machine
Figure 2.13 Simple FSM for NPC [62]

- 16 -
In the light bulb example, there are only two states that light bulb can change in either states of turn
on or off. In normally, there are more than two states and the states transitions are often limited. For
instance in light bulb, the light bulb can be only turn on and bright when a person turn the switch on.
Any systems with limited states can be implemented as a finite state machine. Although there are
other options of modelling how human thinks and learns, the finite state machine are often used to
simulate.
Finite state machines are often used to develop the decision making model of the AI NPC. The
different states can be different states of behaviours and decision. There can be different states based
on the different situation and environment of the game. The state transitions are often based on the
certain events of the game environments. A very simple NPC can be modelled as a finite state
machine using four states as shown in figure 2.13. The NPC can be at the state of gathering the items
around the game environment. When NPC see an enemy, the NPC can either retreat, attack, or chase
the enemy states. While attacking the enemy, if the NPC’s health points low, the NPC can decide to
retreat or gather the items to restore the health to normal.
2.5.2 Machine Learning
According to Arthur Samuel (1959), machine learning is “the field of study that gives computers the
ability to learn without being explicitly programmed.” According to Tom Mitchell machine learning
is “a computer program is said to learn from experience E with respect to some task T and some
performance measure P, if it’s performance on T, as measured by P, improves with experience E.” As
mentioned in section 2.1, learning algorithms are categorized into supervised, unsupervised, and
reinforcement learning. In supervised learning, there are given known classes of data sets and then
train an agent to produce desired outcomes. In unsupervised learning, there are unknown classes of
data given and train an agent to classify those dataset. In reinforcement learning, an agent is rewarded
or punished based on the outcomes of actions on different states.
2.5.3 Reinforcement Learning
Reinforcement learning is an unsupervised learning technique. According to Sutton and Burton,
“reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a
numerical reward signal.”[46]. Think of an agent that in an environment in which it can sense and act.
Suppose it doesn’t have an idea about the effects of actions. There will be rewards and punishments
based on the actions performed. How should an agent choose its action to maximize its rewards over
the long run? In order to maximize the rewards, an agent will need to be able to predict how actions

- 17 -
lead to rewards. The agent exists in an environment consisting of a set, S, of states. The agent decide
which action, a, to perform from the set, A, of actions. The agent perform the actions at the time, t.
The action performs at the time t is at and receive the expected reward as rt = (st, at).
Figure 2.14, Standard Reinforcement Learning Model [46]
At each time t, the agent observes its state st S and the set of possible actions A(st). It chooses an
action a A(st). As action performed, the agent receives the new state st+1 and the reward rt. The
mapping of the state-action value is called policy, π. The popular approach to store the policy values
is tabular approach where a pair of action and state is stored in table format.
2.5.4 Q-learning
Q-learning (Watkins, 1989) [44] is one of the algorithms derived from reinforcement learning. Agents
developed with Q-learning are capable to learn to act optimally in Markovian domain by experiencing
the consequences of actions [45]. One step Q-learning is defined by:
𝛼 is the learning rate where 0 ≤ 𝛼 ≤ 1. It defines how sensitive agent reacts on the reward given.
𝛾 is the discount factor where 0 ≤ 𝛾 ≤ 1. It is a discount factor that multiplies with the Q-value of the
best action the agent can take in the next step.
R(s) is the reward value given to the agent for the action performed at the state s.
Think of the Q-learning as a mapping Q-table, table 1. The rows of the represent all the states of the
environment could possibly be in. The columns are the actions that the agent can act. The Q(s, a)
values in the table represent how favourable the actions should be taken in a specific environment’s
state.

- 18 -
Q-Table Action 1 Action 2 Action 3 Action 4
State 1 Q(s1, a1) Q(s1, a2) Q(s1, a3) Q(s1, a4)
... ... ... ... ...
Table 2.1, Q-Learning mapping table, Q-table
When the agent has to act, it has to choose the best action he knows depending on the current state of
the environment. The quoted procedural algorithm of the Q-learning can be seen as follow [46] –
Initiallize Q(s,a) arbitrarily
Repeat (for each episode):
Initialize s
Repeat (for each step of episode)
Choose a from s using policy derived from Q (e.g., 𝜀-greedy)
Take action a, observe r, s’
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 (𝑅 𝑠 + 𝛾 max 𝑎′ 𝑄 𝑠′, 𝑎′ − 𝑄 𝑠, 𝑎 )
𝒔 ← 𝒔′
;
Until s is terminal
2.5.5 Hierarchical reinforcement learning
In hierarchical reinforcement learning, the reinforcement learning methods are improved with prior
knowledge about the high-level structure of behaviour [70]. By adding the prior knowledge in
hierarchical, it accelerates the search for good policies than the formal reinforcement learning’s policy
searching algorithms.

- 19 -
2.6 Related works
There are various developments and research done with learning agent in FPS games. In the paper
“Automatic computer game balancing: a reinforcement learning approach,” Q-learning algorithm is
used to develop an adaptive agent for the fighting game called Knock’em [59]. In order to be able to
adapt quickly to the opponent in the online-environment, the agent is trained offline. In the paper
“The Team-oriented Evolutionary Adaptability Mechanism (TEAM),” it introduced an evolutionary
approach to evolve a team based strategy for the Capture the Flag game mode in Quake III [60]. The
paper states that the TEAM AI outperformed the original AI from Quake III via adaptively learned
risky tactics. Another similar AI development can be found in “Cerberus: Applying supervised and
reinforcement learning techniques to capture the flag games. [61]” In the paper it used the
reinforcement learning framework for team behaviour and neural networks to control the fighting
behaviour of individual characters in team.
In the paper “Reinforced Tactic Learning in Agent-Team Environments,” authors proposed
RETALIATE algorithm, which is an online reinforcement Q-learning algorithm for team-based tactic
generating in the FPS game, Unreal Tournament 2004 [47]. The RETALIATE agents adapts well in
the dynamic environments and capability to generate adaptive team tactic is significant according to
the paper. The RETALIATE agent is extended in the paper “Recognizing the Enemy: Combining
Reinforcement Learning with Strategy Selection using Case-Based Reasoning,” [48]. Authors in the
paper use CBR to enhance the original RETALIATE agent to adapt faster to the changes in the
environment.
In “Creating a Multi-Purpose First Person Shooter Bot with Reinforcement Learning” paper [49],
author states that there is a promising solution to create rich and diverse behaviours of agents in FPS
game by using hierarchical or rule based approach combined with the reinforcement learning Sarsa
Algorithm. In “Evolution of Intelligent Agent Behaviour in Computer Games” paper, author has
presented high level behaviour optimization and low level missile dodging behaviour optimization
with the use of neural networks genetic algorithms [2].

- 20 -
2.7 Agent architectures in related works
In the first person shooting games, the computer controlled character (NPC) agents are called bots.
Bots can sense the environment and perform the appropriate actions same as the human players. The
data flow of bot and the environment can be seen in figure 2.15.
Figure 2.15, Bot and environment basic interaction diagram
2.7.1 Aciton Selection Mechanism (ASM)
Varieties of bots are designed to perform specific tasks such as patrolling, attacking, retreating, etc.
The behaviour of the bots is influenced by the action selection mechanism (ASM) [2]. The most
common layers of the ASM architecture for a bot can be seen in figure 2.16.
Figure 2.16 Conceptual layers of bot behaviours [2]
Each layer is based on the different level of abstraction and each module is based on different aspect
of the bot’s behaviour and reasoning. The first layer, high level decision making, is the top level of
bot’s action planning. This layer is responsible for tactics, wining strategies, team works, long term
planning, etc. The second layer is responsible for the combat behaviours decision or movements
required in order to achieve the goal made by the first layer. The third layer is responsible for the
lowest bot’s conceptual blocks of bot’s behaviours. This layer includes weapon selection, dodging,
aiming, steering, and navigation. In weapon selection module, bot has to choose right weapon based
on the effective distance of the weapon and best offense to the enemy’s armour. For example,
choosing shotgun weapon for close range combats, and choosing sniper weapon when enemy is at far
range of shotgun’s reachable limit. In some cases, players can dodge the opponent’s attacks such as
incoming missiles. In dodging module, bots senses such delayed hitting attacks and avoid those
attacks. In accuracy module, the bot has to calculate and shoot to the location in order to hit the

- 21 -
moving enemy. In the steering and navigating modules, the bot has to avoid the obstacles on the way
of planned path through the game environment. For example, the bot needs a health pack to restore
the health points and the nearest health pack is on the 2nd
floor of the environment which is reachable
by elevator. The bot has to plan the path to reach the nearest health pack via the elevator.
2.7.2 Behaviour Trees
Behaviour trees are one of popular model to represent the bots’ behaviour. Behaviour trees are often
used by software developing process that is required to simplify vast amount of natural language
requirements. Leaves of the tree represent the executable behaviours of the system. The internal nodes
of the tree represent the behaviours that can be decomposed into smaller sub-behaviours and those
nodes decide which sub-behaviour should be executed. Compare to FSMs, the transitions are reduced
as the hierarchical nature of the tree. A sample behaviour tree for a guard bot can be seen in figure
2.17.
Figure 2.17 A sample behaviour tree for a guard bot [2]
The behaviour trees can be evaluated into two ways: (i) top-down and (ii) bottom-up approaches.
(i) Top-down
In top-down approach, the calculation starts from the root node. The root node only selects the
next node or leaf to be executed. It repeats for the next node to select the further child node or till
the last leaf to execute the action. The advantage of this is less computation and faster speed as
there is only one way from root node to the leaf, and in addition it gives the well defined
behaviour.
(ii) Bottom-up
In bottom-up approach, the computation starts from the leaf. The leaf computes and proposes the
action and passes to the upper level, parent nodes. The parent nodes choose the best proposed

- 22 -
actions and passes the winning action up. The process repeats till it reaches the root node, and
then the winning action is executed. In this approach the whole tree has to be evaluated and
relatively it has disadvantage of high computing cost. However, bottom-up behaviour trees can be
modified in some certain scenarios in order to enhance their performance [66].
2.8 AI Implementation Tools
In AI research and developments, LIPS and Prolog are the most common used languages. The LISP
stands for LISt Processor which is developed by John McCarthy in 1960 [38]. LISP is based on the S-
expression and S-function which are written as list of mathematical parentheses [38]. Prolog is a
declarative logic programming language which is developed in 1971 by Alain Colmerauer and
Phillipe Roussel [39]. The other languages such as Smalltalk [41], MATLAB, C++, and Java are also
used to implement AI.
AI in most modern games uses Scripting language in order to facilitate the AI developers without the
needs of game developers’ help. Most commercial games such as World Craft III [9], Unreal
Tournament [14], and Quake-4 [40] are bundled with AI scripting and game Map creators which let
users to customize the NPC, nature of the game play, and the game world. The bundled NPC
characters are developed 0with the game AI scripts. Other AI research assisting development kits such
as Gamebots (JavaBots) are developed by Andrew N. Marshal and Gal Kaminka with UT04 Scripting
language to control Unreal Tournament AI agents. Based on Gamebots, Pogamut 2 is a Netbeans IDE
plug-in that allows AI researchers to develop and control the Unreal Tournament 2004 AI agents in
the virtual world with the use of famous platform independent programming language Java.

- 23 -
2.8.1 JavaBots
Figure 2.18 JavaBot Structure Diagram [68]
Unreal Tournament is the three-dimensional first-person shooter multiplayer network game which
discussed in section 2.3. The GameBots is the modification to the Unreal Tournament game that
allows the artificial intelligent agents to connect the game server and control the character in the
game. The GameBots project is started at the University of Southern Califonia’s Information
Sciences Institutes in order to turn the Unreal Tournament game into the domain of artificial
intelligent research [67]. The core of the GameBots is a modification to the Unreal Tournament game
and it allows controlling the in game characters via network. The game sends the sensory information
to the client programs through network. The client program can decide which actions to perform and
command back the character to perform specific actions such as move, talk, shoot, etc.
In figure 2.18, it shows the different components assembled together to develop the Java-based
artificial intelligent agent player in Unreal Tournament game. The GameBots Network API and
Unreal Tournament Server is the Gamebots infrastructure. Javabot API, Java-based Visualization tool,
and Bot RunnerApp JavaBot Interface is from the JavaBot package. On the top of JavaBot
infrastructure, the Java-based AI agents are developed. The benefits of using the JavaBot is AI
developer can develop AI agents with not much needing of knowledge of the specifics of the
Gamebots protocols, easy to access the API online, and varieties of example bots to reference [68].

- 24 -
2.8.2 Pogamut 2
Pogamut 2 is a platform designed to facilitate developing and debugging the JavaBots in Unreal
Tournament [69]. Pogamut 2 uses the Unreal Tournament 2004 as the environment for the AI agents.
The main objective of the Pogamut 2 is to simplify the “physical” part of the agent creation. Pogamut
2 acts as a plug in for Netbeans IDE which enables users to code, debug, and run the AI agents in IDE
itself. Users have the advantages of simple one or two command to perform the complex tasks of the
agents such as path-finding and retrieving information from the agent’s memory. The platform can
also use the reactive planner for the agents.
2.8.3 Unified Modelling Language (UML)
Unified Modelling Language is used in various fields such as all type of system and
development phases in software modelling, business modelling, and general modelling of any
construction that has both a static structure and a dynamic behaviour. The work is first started
in 1994 by Grady Booch and James Rumbaugh at Rational Software Corporation. The
version 1.0 of the UML was released in January 1997 [73]. The latest UML infrastructure can
be seen in [72]. In UML 2, it described 13 official diagram types which are listed and
described below [74], and the classification diagram can be seen in figure 2.19.
i. Activity diagram
An activity diagram is used to describe the work flow, business process, and procedural logic
in parallel behaviour.
ii. Class diagram
A class diagram is used to describe the types of the objects in the system and the different
kinds of static relationships among the objects. A class model consists of properties and
operations of it. A class diagram also shows the constraints of the way of the objects
connected.
iii. Communication diagram
A communication diagram is an interaction diagram which emphasizes the data links between
the various participants.

- 25 -
iv. Component diagram
A component diagram is used to break down the system into independently upgradeable and
purchasable pieces. Reasons for dividing down the system into components can be a
marketing decision or a technical decision.
v. Composite structure diagram
In order to break down the complex objects, a composite structure diagram is used to
hierarchically decompose a class into an internal structure.
vi. Deployment diagram
Deployment diagrams show a system’s physical layout, revealing which software run on the
physical hardware.
vii. Interaction overview diagram
Interaction overview diagrams are mix of activity diagrams and sequence diagrams.
viii. Object diagram
An object diagram is used to present the snapshot of the object in a system at a point in time.
An object diagram is often called an instance diagram as it shows instances rather than
classes. It can be used to show an example configuration of objects.
ix. Package diagram
A package diagram is used to group the classes together into higher-levels units with a
hierarchical structure. A package can be inside another higher-level package, and it can
contain a lower-level sub-packages and classes.
x. Sequence diagram
A sequence diagram is used to present the behaviour of a single scenario. It shows numbers of
example objects and the messages which are passed between these objects within the use
case.
xi. State machine diagram
A state machine diagram is used to describe the behaviour of a system. It also shows the life
time behaviour of a single object.

- 26 -
xii. Timing diagram
A timing diagram is used to present another form of interaction diagram yet it emphasize on
timing constraints.
xiii. Use case diagram
A use case diagram is used to capture the functional requirements for a system. It describes
the typical interactions between the user of a system and the system itself in which it provides
a narrative view of how the system works.
Figure 2.19 Classification of UML diagram types

- 27 -
2.8.4 UML State Machine (UML statechart)
State diagrams are often used in describing the behaviour of the systems. State diagrams also
present the states an object can have and how events affect those states over time [73]. A state
diagram consists of following components.
State – A state is an entity’s behaviour stage in which it is assuring some condition, waiting
for some external event, and performing some activity.
Superstate – A superstate is a collection of sub-states which share common transitions and
internal activities.
Activity State – An activity state is a state that the object is doing some ongoing work.
Concurrent State – A concurrent state is a state that concurrently working with other states
at the same time.
Event – An event is the occurrence of a stimulus which triggers a state transition.
Transition – A transition is composed with a source state, a target state, an event, and an
action. It is a relationship between two states that the first state performs specific actions and
achieves to the second state.
Self-transition – Self-transition occurs when the source state and target state are the same.
Actions – An action is an executable activity performed at a moment. There are several types
of actions as followed –
(a) Entry action is performed when entering to the state.
(b) Exit action is performed when existing from the state.
(c) Input action is performed depending on condition and current state.
(d) Transition action performs a certain transition.
The standard notations for state machine diagrams can be seen in table 2.2.

- 28 -
Table 2.2 Standard notations for state machine diagrams
State1 State
CompositeState1 Composite State
Transaction
Fork Transaction
Join Transaction
Decision
Initial State
Final State
H Shallow History
H* Deep History
2.9 Summary
Thus from this chapter, we can clearly see that the games have been developed to a very far extend
since its evolution. The first single-player, stand-alone game developed five decades ago, which had
simple two dimensional static graphic environment has now been replaced by dynamic, high
definition, three-dimensional, multi-player, networked gaming environments. This has been a result of
the various requirements and expectations of the players along with the development and availability
of the latest technology.

- 29 -
CHPATER 3
METHOD
In previous chapter, various reinforcement learning techniques can be applied to develop the
real-time first-person shooting games non-player characters (NPC) or bots. In this chapter
will propose how to develop an NPC bot which learns to play the first-person shooting game,
and choose and perform the best possible actions which learned from previous experiences of
the game plays with the used of Q-learning technique.
3.1 Analysis of the game rules
In this dissertation, Unreal Tournament 2004 game is chosen as the environment for NPC
bots. The chosen game mode is called Death-match. The goal of the game is to achieve the
preset kill count within the limited time or highest killing score within time limit. Killing an
opponent gain a score of kills, death to an opponent gives a score to the killer, and suicide
will count as a negative score to the player.
The goal of the NPC bot has to achieve the high score of kills by maximizing best policies of
choosing the right actions.
3.2 Analysis of States and events
Analysis of states and events will be presented in this section. As Q-learning technique uses
the pairs of state-action products, the possible states of an NPC are stated as followed.
i. Spawn State – At the first time of joining the game or after player dies, player is
spawned at random point with health and basic weapon
ii. Engage State - Attack the enemy with currently equipped weapon
iii. Collect State - Agent is trying to reach to the item and pick up the item when it sees
iv. Retreat State - Run away from the enemy to the safer location
v. Explore State - Agent explores the map looking for enemy, gatherable items and
weapons

- 30 -
The following possible events can be occurred and retrieve information from the game during the time
of game play.
i. E = See an enemy
ii. !E = No enemy at sight
iii. ED = Enemy is dead
iv. D = Agent is dead
v. I = See a collectable item
vi. HL = Agent’s health is low
Figure 3.1 State-event diagram
There are five possible states and six possible events for an NPC bot when it joins the game which is
shown in figure 3.1. In the beginning of the game, the NPC bot will be spawned at one of the random
spawning points of the game. Explore state will be default state in which it will randomly go around
the map and collect items whenever it sees. If the bot is at the spawn state and when it sees enemy, it
will engage to the enemy opponent which is engage state. If the bot defeated the opponent, it will
choose either collect state in which collecting the nearest collectable items or explore state. The low
health event will trigger the bot to retreat from the engage state and look for health items in the
environment. If the bot dies, it will re-spawn again at the random location until the game is over.

- 31 -
Above state-event analysis is transformed into UML state machine diagram which can be seen in
figure 3.2. In the figure 3.2, it represent the over view of an NPC agent (bot) in death match game
mode. When the game starts, the bot is randomly spawned at one of the spawn points in game map. If
the bot doesn’t see any enemy after it is spawned, it goes to explore state in which the bot is
wandering around the map, looking for collectable items and weapon, and searching for enemy to
defeat. Detail behaviours and states can be seen in figure 3.3. If the bot sees an enemy, it goes to
engage state in which the bot perform the attacking behaviours that are described in figure 3.4. When
the bot heard the sound of the enemy, hit by enemy, or lost sight of currently fighting the enemy, it
will move to chase state where the bot will go to the last seen enemy’s location and chase the enemy.
If the bot was hit severely and the health is lower than the threshold level, it will go retreat and
actively looking for health increasing items. If the bot is defeated by an enemy and the game isn’t
over, it’ll be re-spawned at random location in the map again. If the limited game time is reached or
the preset winning score is reached by either participant in the game, the game will be finished.
Figure 3.2 Overview Game’s States diagram

- 32 -
Figure 3.3 Detail Explorer State diagram
Figure 3.3 represent the detail states of the exploring super-state. If the bot doesn’t see any enemy or
the last enemy was defeated, it will go to exploring state. It will be actively searching collectable
items and going around the map at the same time. If the bot sees an item collectable, it will move to
the state of collect item. While searching items, if the bot heard the enemy sound, it will go to chase
mode. If the bot see enemy, it will go to engage state and attack the enemy.
Figure 3.4 Detail Engage State diagram

- 33 -
Figure 3.4 is the detail sub-states of engage super-state. It has nested states that the bot can be normal
health or lower health states. The enemy can have better weapon than the agent. During the fight the
ammo of the current weapon could run out and agent health could go lower. If the current weapon is
not loaded or if there is a better weapon, the bot change to the better loaded weapon. The decision of
the bot should still engaging enemy or retreat will be based on the previous experience stored in Q-
table.
Figure 3.5 Detail Chase State diagram
In figure 3.5 states the detail states of the chase scenario. If the bot get hit from behind, it doesn’t see
enemy. So that it look around and search for enemy where it is shooting. If the bot is fighting the
enemy and the enemy retreat and lost sight, the bot has to find the enemy to defeat it before the enemy
recover health. If the bot is roaming around the map for items and heard the enemy footsteps, the bot
should be alert and find the enemy near around. If the bot doesn’t see any enemy after looking for, it
should resume explore state to gather items.
Figure 3.6 Detail Retreat State diagram

- 34 -
In figure 3.6, it presents the detail diagram of retreat state of the bot. When the bot’s health is low,
priority items to look for is restoring health items. If the bot still having the enemy at sight, it will
shoot and also look for health packs concurrently. When it sees the health packs, it will try to reach to
the items and collect. After the health is restored to normal condition, the bot will resume exploring
actions or engaging actions depending on enemy present.
3.3 Analysis of available actions
There are sets of actions that an NPC bot can perform in game. They are as followed –
 Shoot – Shoot the enemy with the current equipped weapon.
 Stop shoot – Stop shooting if the bot is shooting.
 Change weapon – Changing the current equipped weapon with another weapon in the
inventory. This action is helpful if the current weapon is out of ammo or the bot has better
weapon in the inventory.
 Run to item/player – Bot can run to the item or player in order to collect or attack close range.
 SetWalk – The default movement in game is running and it makes foot-steps sound. In order
to avoid making foot-step sound, a bot can set to walking mode for movement.
 Stop – Set the bot to stop movement.
 Jump – A bot jump. It also can jump again at the highest point of the first jump and it is called
double jump.
 Turn to – Set the bot to turn to face certain player or location.
 Strafe – Strafe is a movement in which facing to the same direction and moving left or right.
 Rotate – Rotating the bot to specified degree of angle.

- 35 -
3.4 Mapping States and Actions
Section 3.2 and 3.3 have analyzed the possible states and actions could have for the bot in death
match game. This section will try to map those states and actions. There are states that normally come
along together in engage state. For example, if a bot will see enemy, health is below threshold level,
weapon is loaded, and bot has better weapon then decide whether pursue enemy or retreat. The bot
will learn from the previous experience that whether engaging enemy would more successful than
retreating. Following table 3.1 listed some of the possible states and actions map in which it’ll become
Q-table and record the rewards values for each actions bot perform in certain state.
Table 3.1 State-action mapped table
States/Actions Shoot Stop
Shoot
Look
around
Retreat Change
best
weapon
See Enemy + Health is > 50% + Weapon Loaded +
Has better weapon
See Enemy + Health is <= 50% + Weapon Loaded +
Has better weapon
See Enemy + Health is > 50% + Weapon is not
Loaded + Has better weapon
See Enemy + Health is <= 50% + Weapon is not
Loaded + Has better weapon
See Enemy + Health is > 50% + Weapon Loaded
See Enemy + Health is <= 50% + Weapon Loaded
See Enemy + Health is > 50% + Weapon is not
Loaded
See Enemy + Health is <= 50% + Weapon is not
Loaded
No Enemy + Health is <= 50% + Weapon is not
Loaded
No Enemy + Health is > 50% + Weapon is not
Loaded
No Enemy + Health is > 50%
...etc

- 36 -
3.5 Behaviours
In section 3.3, it has various actions that a bot can perform. This section will discuss about the
combined actions as the behaviours of the bot.
Offensive behaviour – In offensive behaviour, the bot perform mostly aggressive attacks and
pursuing actions. It will less care about getting hit and taking cover from the opponent’s attacks. This
behaviour could happen if the bot has equipped with best weapon loaded, full armour, and full health.
Defensive behaviour – In defensive behaviour, the bot will care about getting hits from the opponent,
looking for a cover to avoid the attacks, and looking for more armour and health items.
Hiding behaviour – In hiding behaviour, the bot will try to hid in one place which normally called as
camping. It will rarely move from one place to another and waiting for an opponent to appear to
attack. On the other hand, the bot has low health and trying to lose sight from the enemy and recover
the health.
Gatherer behaviour – In gatherer behaviour, the most actions that the bot will perform is to roam
around the map and collect more items. This behaviour can happen at the beginning of the game in
which the bot has only basic weapon.
Retreat behaviour – In retreating behaviour, the most actions that the bot will perform is to lose sight
from the enemy and collect the nearest health recovering items in the map.

- 37 -
3.6 Implementation
3.6.1 Proposed architecture
Figure 3.7 An overview framework of an Java bot on UT 2004 game server
The Q-learning bot is going to develop in Java. This project will use the Netbeans 6.8 as a developing
environment. On the top of the Netbeans, Pogamut 2.4 platform will be used. Figure 3.7 shows the
overview framework of a Java bot connected to the Unreal Tournament 2004 server. When the
program starts, it initializes the state-action Q-table (table 3.1) or restores if there is a previous
recorded Q-table. Then the game starts. After the game starts, the bot choose whether a random
action or the max-valued applicable action from the Q-table. After choosing the action, it executes the
chosen action. After the action is performed, it will check the result of the action and state. The bot
will be given rewards or punishments according to the result and update the Q-table. If the bot is dead
and the targeted game score is reached or the time is up, the game will end. If bot is dead it’ll be re-
spawned and will resume choosing actions for next state. The flow of this development can be seen as
a flow chart in figure 3.8. The implemented algorithm is as followed -
Initialize Q(s, a) table
Repeat (for each combat sequence):
Check state s
Repeat (for each step of combat sequence)
Select action a based on s
Execute a, observe state, receive reward r
Calculate Q-value based on reward, action, state
Until agent is dead or enemy defeated
Until game is over
Set of commands, movements, and actions performed by bot
Netbeans +
Pogamut
Interface
JavaBot API
Unreal
Tournament
2004 Game
ServerQ-Learning
Java Bot
Events and environment information from game server

- 38 -
Figure 3.8 Flow-chart Diagram for Q-learning Bot
Initialize / restore
State-Action Q-Table
Random Action for
Exploring
Game Start
Max-valued applicable Action
from State-action Table
Choose
Execute Action
Observe State
Calculate Reward and
Update State-action Q-Table
Game Over?
No
Yes
Dead?
No
End
Start

- 39 -
3.6.2 Structure of the Q-learning bot
There are six classes in the Q-learning bot development. They are as followed.
i. Main.java
Main.java class is mostly of the agent’s brain. It has different sub-method that performs the
logical operations, transactions, and commands to operate the bot in the Unreal Tournament
2004 game server.
ii. Agent.java
Agent.java class is a wrapped class of the body, inventory, and map classes. All bots have to
be derived from this class and use the methods of its compounds.
iii. AgentBody.java
AgentBody.java class takes care about performing commands and receiving message from
Unreal.
iv. AgentInventory.java
AgentInventory.java takes care of the weapons agent is carrying and provides the useful
methods such as switching weapons or determining suitable visible ammo for the weapons
agent has in inventory.
v. AgentMemory.java
AgentMemory.java takes care of the recent items, enemies, and game world information.
vi. GameMap.java
GameMap.java provides the necessary navigation information for the bot to go around the
map such as nearest navigation path or a path to the wanted items.
Agent.java, AgentBody.java, AgentInventory.java, and GameMap.java classes are enclosed in
Pogamut platform and developed by Horatko and Jimmy [69].

- 40 -
There are several main methods and functions that are used to develop the Q-learning java bot, and
they are as followed.
 void initiateQTable( );
This is the method that initiates the Q-Table.
 private state checkCurrentState( );
This is the function that checks the current state of the bot and returns.
 private action chooseAction(state s);
This is the function that chooses the action based on the current state.
 private void stateEngage ( );
This is the method that performs a list of attacking actions.
 private void stateRetreat( );
This is the method that bot trying to retreat from the enemy and reach to the nearest health
items.
 private void stateRunAroundItems( );
This is the method that makes the bot to run around for the items in the map.
 protected boolean hasBetterWeapon( );
This is the function that returns true if the bot has better weapon.
 protected boolean canRunAlongMedKit( );
This is the function that returns true if the bot can gather for the medical kit (health items).
 protected void postPrepareAgent( );
This is the method that prepares the agent after the game starts. For this project, it initiates the
gatherable items and weapons from the map and listed as collectable items for the bot to collect.
 protected void stateStucked( );
This is the method that the bot will choose another random navigation path to pursue if it is
repeatedly colliding with walls for over five times.

- 41 -
 protected void updateCurrentStatus( );
This is the method that updates the bot’s current health, armour, ammo, enemy in sight, and
loaded weapon.
 protected double getReward(State s, Action a, Result r);
This is the function that calculates and returns the Q reward value.
 protected void updateQTable( );
This is the method that updates Q-Table.
 protected void recordQTable( );
This is the method that records the final QTable in text file format.
 public void shoot (Player target);
This is the method that command bot to shoot the target player.
 public void stopShoot( );
This is the method that command bot to stop shooting.
 public void jump( );
This is the method that command bot to jump.
 public void doubleJump( );
This is the method that command bot to double jump.
 public boolean getSeeAnyReachableAmmo( );
This is the function that returns true if the bot see any reachable ammo at sight.
 public boolean getSeeAnyReachableWeapon( );
This is the function that returns true if the bot see any reachable weapon at sight.
 public boolean getSeeAnyHealth ( );
This is the function that returns true if the bot see any reachable health at sight.

- 42 -
 public void runToTarget (Player target);
This is the method that commands bot to run to the player.
 public void runToTarget (Item target);
This is the method that commands bot to run to the item to collect.
 public int getAgentHealth ( );
This is the function that returns the integer type of agent’s health.
 public void changeWeapon(AddWeapon newWeapon);
This is the method that commands agent to change the weapon.
 public void changeToBestWeapon( );
This is the method that commands agent to change the best weapon in its inventory.
 protected void doLogic( );
This is the method that performs the most logical operations for the bot during the iteration.
This is the method which can be called bot’s brain.
 protected void prePrepareAgent( );
This is the method that prepares agent logic to run. For example, start initializing the neural
networks etc.
 protected void postPrepareAgent( );
This is the method that prepares the bot’s logic according to information from gathered from
startCommunication. For example, bot is choosing plan or parameters according to game type.
 protected void shutdownAgent( );
This is the method that cleans up after the end of simulation of agent.
3.7 Summary
In this chapter, it discussed about the analysis of states, actions, and behaviours of the bot could have
in the death match game environment. Furthermore, it presented about the actual implementations and
methods used in the development of the project. The testing, evaluation, and in game screen shots will
be discussed in follow up chapter 4.

- 43 -
CHAPTER 4
EVALUATION
Testing the developed software is part of the development cycle of the software engineering. It is the
process of validating the developed software whether it meets the requirements from design and
performing as expected. The aim of the project is to develop the NPC agent that learns how to play
the first person shooting game. The unit testing and integrated testing are performed in the testing
phase.
4.1 Unit Testing
Unit testing is the tests that verify the functionality of the specific code at the functional level. There
are several states and actions performed by NPC agent (Qbot) in this project. The basic actions and
functionalities are tested as following categories.
 Shooting – the bot shoot to the given target as command in the testing.
 Stop shooting – the bot stop shooting if it is shooting.
 Jumping – the bot jump according to the command in which normal jump or double jump.
 Movement through navigation point to point are correctly performed.
 Change weapon – the bot change to next weapon in the inventory as command.
 Change best weapon in inventory – the bot change the available best weapon in its inventory.
 Initialization of classes – the bot agent is initialized as expected.
4.2 Integration Testing
In integration testing, the modules are combined as the whole system and tested. The following
scenarios are tested as they are the most common in the first-person shooting games.
 Exploring, navigation and collecting items around the map
 Engaging with an enemy
 Collecting health packs even not in retreat state if the NPC agent is low health
 Retreating and colleting health items in priority
 Pursuing the enemy

- 44 -
The following parts are the integration testing and some issues during the test performed.
 If the Qbot sees collectable items, it has to try to reach to the item in sight which can be seen
in figure 4.1. The bot performed well in navigation through the map. Whenever it sees any
collectable items, it tried to reach the item and collected into the inventory. There is a minor
issue that the bot is still trying to collect ammo for a few time after the maximum amount of
ammo is reach.
 If the bot sees enemy which is most probability of wining, it has to try to shoot the enemy in
sight which can be seen in figure 4.2. The bot performed engaging function when it saw an
enemy. It tried to change the best weapon available during the fight.
 If the bot is low health and even not in retreat mode, it has to try to restore the health which is
in figure 4.3. The bot perform well if the health packs are on the reachable ground. However,
in situation such as the health packs are on a higher ground and the navigation to the items are
not well defined, the bot stopped near at the item and trying to collect. This issue could be
solved with manual ray-casting movements yet it wasn’t in this project scope. Therefore, it is
left for the future development.
 During the engaging with enemy, if the NPC is low on health, it should retreat. In the testing
the bot try to retreat from the enemy which can be seen in figure 4.4. In this testing, the bot
try to run away and collect the health pack along the way. However, it still can’t lose the sight
from the enemy easily depending on the maps.
Figure 4.1 QBot trying to reach to the weapon

- 45 -
Figure 4.2 QBot trying to shoot enemy at sight
Figure 4.3 QBot trying to reach to the health items when low health (white bar indicates
total health lost percentage)
Figure 4.4 QBot trying to retreat from enemy after low health (white bar indicates total
health lost percentage)

- 46 -
4.3 Experiment
In this section, the Qbot will be evaluated against the Hunter bot which is a sample heuristic bot
included in the Pogamut2 platform. The bot collects weapons in the map, runs for health items when
its health is low, try to kill anyone from different team on sight. The logic behind the Hunter bot is as
followed –
 If the bot sees an enemy and possess better weapon, it changes to better weapon.
 If the bot sees an enemy, go to ENGAGE in which the bot start shooting / hunt the enemy /
change weapon.
 If the bot is shooting and lost target, it stops shooting.
 If the bot is being shot, it turns around and tries to find your enemy.
 If the bot hitting wall, it checks wall and tries to jump.
 If the bot is following enemy, it runs to enemy last position.
 If the sees an item, it picks the most suitable item and runs for it.
 If the bot is hurt, it runs around list of Health items such as MedKits, HealthVials.
 If the bot has nothing to do, it runs around items around the map.
The game type that evaluation will make is Death Match. In this game type, any player who
scores the targeted game score will win the game.

- 47 -
4.3.1 Experiment 1
The first experiment is laid as followed.
Table 4.1 Experiment 1 setup table
Map DM-Flux2
Opponent Hunter bot
Game mode Death Match
If agent get Kills Reward = 1
If agent dies Reward = -1
Health pack gathered Reward = 0.3
New weapon gathered Reward = 0.2
Discount Factor 0.9
Learning rate 0.8
Exploration 0.8
In this experiment, the bot doesn’t have the maximum policy at the beginning of the game. It
randomly chose the actions to perform in different situations. The outcome wasn’t quite well
according to the game scores. The hunter bot won the game as it reached the targeted killing score.
The maximum average killing score that Qbot acquired after 10 game runs was 5.4. The maximum
killing score in a game was 7.

- 48 -
4.3.2 Experiment 2
Map DM-Flux2
Opponent Hunter bot
Discount Factor 0.9
Learning rate 0.8
Exploration 0.0
In this second experiment, the reward values and the layouts are set up same as the
experiment 1. The trained QTable from the experiment 1 will also be used. The only different
is the exploration is set to 0 in which the Qbot will try to perform the maximum possible
actions in most situations.
The outcome of the result is interesting. The Qbot tries to perform the best actions in which it
had experienced in the earlier games. The results noticeably increased. The Qbot won 5 out
of 10 games. The average killing scores was increased to 7.6. However, from the observation
of the behaviours during the playing time, the Qbot rarely retreat from the enemy as it doesn’t
have much punishment from its death.

- 49 -
4.3.3 Experiment 3
Map DM-Flux2
Opponent Hunter bot
Discount Factor 0.9
Learning rate 0.8
Exploration 0.5
In the third experiment, the punishment for death increased to -5. The exploration is set to
0.5. In addition, the dodging action is added into the action list. The rest parameters are set as
the same in experiment 1 and 2. In this experiment, the winning rate is increased to 7 out of
10 games. The bot behaviour has changed that it tries to retreat from the fights as the
punishment for death is increased. The other factor can be the increased of exploration and
the bot tries to perform random actions in sometimes to explore different strategy than
experiment 2. The bot also perform dodging while engaging the enemy as it has more
survivability than standing and shoot.
4.4 Summary
In this section we have seen about the various tests and experiments carried out on the
proposed architecture. The variations in the results observed.

- 50 -
CHAPTER 5
CONCLUSION AND FUTURE WORKS
The main objectives of this dissertation were to propose, implement, and evaluate an NPC which
learns to play the first-person shooting game in Unreal Tournament 2004. The bot has successfully
developed with the Pogamut2 platform on NetBeans 6.8 java developing environment with the Q-
learning algorithm. From the experiments performed, it is been observed that the bot has successfully
learned not only how to play the first-person shooting game but also the best tactics to outperform the
opponent Hunter bot. According to the experiments, the bot behaviours were changed based on the
rewards values. There are a lot more variation that the bot could be able to perform with lesser codes
than the heuristics bots.
In the future, the game developers can use this technique as finding tactics to defeat the opponents and
finding more possible strategies in real-time first-person shooting games. This work has done on high
level decision making yet the lower levels such as weapon selections, combat behaviours, and
movements could be implemented with unsupervised learning algorithms. There are more detail
actions and combinations of the actions or behaviours could be put in this work for better outcomes.
The further work will be to investigate different experimental setups, adding more possible combined
actions and states. The better challenging artificial intelligent players in games are coming in the near
future.

- 51 -
References:
[1] Charles Weddle, Artificial Intelligence and Computer Games, Florida State University, URL:
http://www.charlesweddle.com/s-misc/doc/cis5935-weddle.pdf [28 Sep 2009]
[2] Rudolf Kadlec, Evolution of intelligent agent behaviour in computer games, Charles University,
Prague, 2008
[3] 0. Burkert. Unreal tournament twins, Bachelor’s thesis, Charles University, Prague, 2006
[4] P. Tozour, The Evolution of Game AI, AI Game Programming Wisdom, Charles River
Media,Inc., Hingham, MA, 2002
[5] Miles, C. Quiroz, J. Leigh, R. Louis, S.J., Co-Evolving Influence Map Tree Based Strategy Game
Players, Dept. of Comput. Sci. & Eng., Nevada Univ., Reno, NV;
[6] D. Nieuwenhuisen, A. Kamphuis, M.H. Overmars, High quality navigation in computer games,
Science of Computer Programming 67 (2007) 91–104, Available online 6 April 2007.
[7] MDK2, URL: http://www.bioware.com/games/mdk2/ [6 April 2009]
[8] Command and Conquer, URL: http://www.commandandconquer.com/ [6 April 2009]
[9] Warcraft III, URL: http://us.blizzard.com/en-gb/games/war3/ [6 April 2009]
[10] Age of Empires III, URL: http://www.microsoft.com/games/en-us/Games/Pages/AgeofEmpires
3.aspx [6 April 2009]
[11] Final Fantasy XII, URL: http://www.finalfantasy12.eu.com/site_en.html [6 April 2009]
[12] Resident Evil, URL: http://www.residentevil.com/index.php [6 April 2009]
[13] The Witcher, URL: http://www.thewitcher.com/ [6 April 2009]
[14] Unreal Tournament, URL: http://www.unrealtournament2003.com/ut2004/index.html, [6 April
2009]
[15] Crysis, URL: http://www.ea.com/games/crysis [6 April 2009]
[16] Farcry 2, URL: http://www.farcry2.com [6 April 2009]
[17] Call of Duty, URL: http://www.callofduty.com/ [6 April 2009]
[18] Half-Life, URL: http://half-life.com/ [6 April 2009]
[19] Halo 3, URL: http://halo.xbox.com/halo3/ [6 April 2009]

- 52 -
[20] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:
269–271, 1959.
[21] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of
minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4(2): 100–107, 1968.
[22] J.M.P. van Waveren and L.J.M. Rothkrantz. Automated path and route finding through arbitrary
complex 3D polygonal worlds. Science of Computer Programming, Robotics and Autonomous
Systems, Volume 54, Issue 6, 30 June 2006, Pages 442-452, Available online 17 April 2006.
[23] P. Tozour, Introduction to Bayesian Networks and Reasoning Under Uncertainty, AI Game
Programming Wisdom, Charles River Media, Inc., Hingham, MA, 2002
[24] Rich Caruana and Alexandru Niculescu-Mizil, An Empirical Comparison of Supervised Learning
Algorithms, Department of Computer Science, Cornell University, Ithaca, NY 14853 USA, URL:
http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf [5 Oct 2009]
[25] Hannah Blau and Amy McGovern, Categorizing Unsupervised Relational Learning Algorithms,
Department of Computer Science, University of Massachusetts, Amherst, Massachusetts 01003-9264,
URL: http://kdl.cs.umass.edu/papers/blau-mcgovern-srl2003.pdf [5 Oct 2009]
[26] Zhu, X., and Goldberg, A., Introduction to Semi-Supervised Learning, Synthesis Lectures on
Artificial Intelligence and Machine Learning, 2009, URL:
http://www.morganclaypool.com/doi/abs/10.2200/S00196ED1V01Y200906AIM006 [5 Oct 2009]
[27] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore, Reinforcement Learning: A
Survey, Journal of Artificial Intelligence Research 4 (1996) 237-285, URL:
http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a.pdf [5 Oct 2009]
[28] V. N. Vapnik. Statistical learning theory. New York: Wiley, (1998) 339-371.
[29] V. Tresp. A Bayesian committee machine, Neural Computation, 12, 2000, URL:
http://wwwbrauer.informatik.tu-muenchen.de/~trespvol/papers/bcm6.pdf [7 Oct 2009].
[30] Jonathan Baxter, A Model of Inductive Bias Learning, Research School of Information Sciences
and Engineering, Australian National University, Canberra 0200, Australia, Journal of Artificial
Intelligence Research 12 (2000) 149–198, http://www-2.cs.cmu.edu/afs/
cs/project/jair/pub/volume12/baxter00a.pdf [7 Oct 2009]
[31] Y. Jin, B. Sendhoff. Pareto-based multi-objective machine learning: An overview and case
studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews,
38(3):397-415, 2008.

- 53 -
[32] Y. Jin (Editor). Multi-Objective Machine Learning. Springer, Berlin Heidelberg, 2006, URL:
http://books.google.co.uk/books?id=8MCHuwYmy5UC&printsec=frontcover&dq= multi-
objective+machine+learning#v=onepage&q=&f=false, [7 Oct 2009]
[33] World of Warcraft, URL: http://www.worldofwarcraft.com/index.xml [7 Oct 2009]
[34] EVE online, URL: http://www.eveonline.com/ [7 Oct 2009]
[35] AION online, URL: http://uk.aiononline.com/ [7 Oct 2009]
[36] Silkroad online, URL: http://www.joymax.com/silkroad/ [7 Oct 2009]
[37] SBOT, URL: http://www.bot-cave.net/main/ [7 Oct 2009]
[38] John McCarthy, Recursive Functions of Symbolic Expressions and Their Computation by
Machine, Part I, Massachusetts Institute of Technology, Cambridge, April 1960 http://www-
formal.stanford.edu/jmc/recursive.pdf [8 Oct 2009]
[39] Robert A. Kowalski, The early years of logic programming, January 1988 Volume 31 Number I,
Communications of the ACM http://www.doc.ic.ac.uk/~rak/papers/the early years.pdf [8 Oct 2009]
[40] Quake 4, URL: www.idsoftware.com/games/quake/quake4
[41] Alan C. Kay, The Early History of Smalltalk, URL: http://www.metaobject.com/
papers/Smallhistory.pdf [8 Oct 2009]
[42] Epic Games, URL: http://www.epicgames.com/ [8 Oct 2009]
[43] MATLAB, URL: http://www.mathworks.com/ [8 Oct 2009]
[44] Christopher John Cornish Hellaby Watkins, Learning from Delayed rewards, Kings College,
1989, URL: http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf [20 Oct 2009]
[45] Christopher J.C.H Watkins, Peter Dayan, Technical Note, Q-learning, 1992, URL:
http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf [20 Oct 2009]
[46] Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction, MIT Press, Cambridge,
MA, 1998.
[47] Megan Smith, Stephen Lee-Urban, Héctor Muñoz-Avila, RETALIATE: Learning Winning
Policies in First-Person Shooter Games, Department of Computer Science & Engineering, Lehigh
University, Bethlehem, PA 18015-3084 USA

- 54 -
[48] B. Auslander, S. Lee-Urban, C. Hogg, and H. Munoz-Avila, “Recognizing the Enemy:
Combining Reinforcement Learning with Strategy Selection using Case-Based Reasoning,” in
Advances in Case-Based Reasoning: 9th European Conference, ECCBR 2008, Trier, Germany,
September, 2008, Proceedings, K.-D. Althoff, R. Bergmann, M. Minor, and A. Hanft, Eds. Springer,
2008.
[49] Michelle McPartland and Marcus Gallagher, Creating a Multi-Purpose First Person Shooter Bot
with Reinforcement Learning, 2008, IEEE Symposium on Computational Intelligence and Games
(CIG'08)
[51] Deep Blue, URL: http://www.research.ibm.com/deepblue/watch/html/c.shtml [20 Jan 2010]
[52] Timeline: Brief history of Artificial Intelligence, URL:
http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/BriefHistory [20 Jan 2010]
[53] The History of Artificial Intelligence, URL: http://library.thinkquest.org/2705/history.html [20
Jan 2010]
[54] The Cathode Ray Amusement Tube Patent, URL: http://www.pong-story.com/2455992.pdf [20
Jan 2010]
[55] ZorbaTHUT, QuestHelper, World of Warcraft Add-on, URL:
http://wow.curse.com/downloads/wow-addons/details/quest-helper.aspx [20 Jan 2010]
[56] Pericles, ShockandAwe, World of Warcraft Add-on, URL:
http://wow.curse.com/downloads/wow-addons/details/shockandawe.aspx
[57] L. V. Allis, H. J. van den Herik, and M. P. H. Huntjens. Go-Moku Solved by New Search
Techniques. In Proceedings of the 1993 AAAI Fall Symposium on Games: Planning and
Learning, 1993.
[58] R. Gasser. Solving Nine Men’s Morris. Computational Intelligence 12(1): 24–41, 1996.
[59] G. Andrade, G. Ramalho, H. Santana, and V. Corruble, “Automatic computer game balancing: a
reinforcement learning approach,” in AAMAS ’05: Proceedings of the Fourth International Joint
Conference on Autonomous Agents and Multiagent Systems. New York, NY, USA: ACM, 2005, pp.
1111–1112.
[60] S. Bakkes, P. Spronck, and E. O. Postma, “Team: The team-oriented evolutionary adaptability
mechanism,” in ICEC, ser. Lecture Notes in Computer Science, M. Rauterberg, Ed., vol. 3166.

Final Joined

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Final Joined

Similar to Final Joined (20)

Final Joined