Engler and Prantl system of classification in plant taxonomy
What is goap, and why is it not already mainstream
1. 1
What is GOAP, and Why Is It Not Already Mainstream?
Aakash Chotrani, Donghwi Jin
1 Introduction
It’s hard to ignore the potential of planning techniques for Game AI, and over the years they
have drawn much attention of game developers and researchers. Goal Oriented Action
Planning, or GOAP for short, is an Artificial Intelligence system for agents that allow them to
plan a sequence of actions to satisfy a particular goal without having to maintain large and
complex state machines. GOAP offers dynamic problem solving which gives the illusion that
the AI understands the environment, and reaches the goal in a logical manner. Fundamentally,
the technique involves formulating a list of actions whose outcome will satisfy the goal.
2 Motivation behind GOAP
Most of us are familiar with finite state machines (FSMs) which are used in popular game
engines such as Unity3d. Although they are easy to implement and design due to their cyclic
nature, they also have several disadvantages. FSMs have poor scalability, since they
eventually become very complex with the increase in the number of states. At one point,
adding a new state to an existing FSM becomes very difficult. FSMs also handle concurrency
very poorly. When running multiple states in parallel, FSMs often end up in a deadlock,
forcing the programmer to make changes until they become compatible with each other.
Finally, FSMs are very labor intensive. It takes a lot of time and resources to determine the
conditions for a transition from one state to another, and eventually becomes a major source
of behavioral bugs.
FSMs, like any other technology, have their own place in the game development
process. However, scripts and co-routines are becoming more popular these days, and
hierarchical planners are increasingly making their way into games and middleware. Planning
technology has made notable progress in past 10 years since being introduced in the game
industry. There are certainly a lot of problems to solve but much progress has been made
already.
There aren’t many games that use planners in comparison to other techniques but AI
in those games has been much well received by the players. The most successful games in
which planning has been implemented are mostly open world, which have emergent
gameplay. Games where there is much linear and story driven gameplay has received much
poor reviews.
3 What is GOAP?
In GOAP, agents are supplied with a goal state of the world and a set of actions which
2. 2
change the state of the world upon being performed by the agent. Agents follow a particular
sequence of actions to achieve the goal state of the world. The particular sequence not only
depends on the goal state of the world, but also on the current state of the world. Hence,
GOAP is a realistic and dynamic architecture which takes into account both the current state
of the world and the goal state of the world.
For example, we have an agent who is a robber and his goal is to collect all the coins
and exit the world. In order to collect the coins, the agent has to choose which coin to
collect, which depends on the distance between the agent and the coin. The closer the coin is
to the agent, the higher the priority is to get collected first. After collecting all the coins, the
agent has to get a key in order to open the door and exit the world.
Hence the agent is supplied with:
Goal:
ExitTheWorld
Actions:
CollectAllCoinsAction
CollectKeyAction
OpenDoorAction
ExitAction
A GOAP planner chooses the optimal sequence of actions to achieve the goal state of the
world by considering which actions are available to be performed by the agent at a given
state of the world using the preconditions and effects of each action.
3.1 Actions
Anything the agent performs is considered as an action. It could be picking up the coin,
opening the door, moving to the target, playing animation or sound, and so on. Actions are
independent of other actions, and we do not need to worry about an action’s connection to
other actions while the action is being performed.
Each action has a cost associated with it. Actions with lower costs are preferred over
actions with higher costs since we want the final sequence of actions toward the goal state of
the world be optimal. In a given world state there could be more than one sequence of
actions available to achieve the goal state. The planner decides which sequence to choose
based on the total cost of actions to achieve the goal state. For example, assume there is a
world where the agent can collect one diamond instead of collecting all the coins and exit
the world:
CollectDiamondAction cost: 10
CollectCoinsAction cost: 3
CollectKeyAction cost: 2
OpenDoorAction cost: 1
We have more than one sequence of actions available to achieve the goal state of the world:
3. 3
CollectDiamondAction -> CollectCoinsAction -> OpenDoorAction
= ExitWorld(total: 13)
CollectCoinsAction -> CollectKeyAction -> OpenDoorAction =
ExitWorld(total: 6)
Each action is composed of preconditions and effects. Preconditions are conditions required
to execute the action, and effects are the result the action produces after it has successfully
been executed. An example set of actions and their preconditions and effects may be:
CollectCoinAction
Precondition: hasCoin = false
Effect: hasCoin = true
CollectKeyAction
Precondition: hasCoin = true, hasKey = false
Effect: hasKey = true
OpenDoorAction
Precondition: hasKey = true
Effect: exitWorld = true
Depending on the action, there could be multiple preconditions or effects associated to a
single action.
In summary, a goal is described by relevant world state properties. An action is
described in terms of its effect on the world and its world state preconditions. A plan is a
sequence of actions which is formulated by a component called the planner.
Figure 1 A relationship between the AI agent and the planner is shown in the diagram
above.
The input to the planner includes a current world state, a previously selected goal,
and a set of actions. The goal's properties are modulated with the current state of the world
4. 4
to produce the goal state of the world. A plan is then formulated by searching the state space
graph for an optimally ordered sequence of actions which produce the goal state of the world
from the current state of the world.
4 Our Attempt at GOAP
We created a simple demo application of GOAP, which was built around a world that
consisted of a player, several coins, a key shop, and an exit door. The player’s ultimate goal
was to leave the world through the exit door, which is initially locked. In order to open the
door, the player had to purchase a key from the key shop for five coins. The player initially
did not own any coins; he had to collect coins from around the world by moving to each
coin and then picking it up. Hence, the player’s general plan was to pick up enough coins to
buy a key, and then use the purchased key to leave the world through the exit door.
Figure 2 This is a screenshot of our demo application, which displays the player’s
current action and plan as they are carried out.
A planner was responsible for formulating a specific plan, or a sequence of actions,
that the player would follow in order to achieve the goal. In order to create a plan for the
player, the planner goes through three steps: recognize all available actions, search for the
optimal sequence of actions which achieves the goal, and relay the information to the player.
Before all of this could begin, however, the programmer has to supply the entire set
of actions that is available to the player, each with its own preconditions and effects. In our
demo, the entire set of actions were:
5. 5
PickCoinAction
Precondition: coinsOwned < 5
Effect: coinsOwned += 1
PickKeyAction
Precondition: coinsOwned >= 5, hasKey = false
Effect: hasKey = true
OpenDoorAction
Precondition: hasKey = true, exitWorld = false
Effect: exitWorld = true
It is from this entire set of actions that the planner chooses which actions are available to the
player at this current state of the world. For instance, at the initial state of the world, the
player’s parameters are given as coinsOwned = 0, hasKey = false, and
exitWorld = false. Out of the three given actions, the player’s parameters satisfy the
preconditions of the PickCoinAction action only. Hence, at this state of the world, the
only available action to the player is the PickCoinAction action.
Figure 3 This figure illustrates an example of a tree structure of actions. A path from
the root node to one of the terminal leaf nodes would represent a sequence of actions for the
player to take.
After the planner recognizes all actions currently available to the player, it then
creates a tree structure, much like Figure 3 above, where each node contains an action to be
performed by the player. At each node, the player’s parameters are updated according to the
action taken by the player at that node, and any single action that could be performed with
the newly updated player’s parameters become children of the node. Once a node’s action
allows the player to exit the world, the node does not create any more children since there is
no need to and becomes terminal. This way, a possible action sequence for the player can be
6. 6
represented by a path between the topmost parent node and one of the terminal leaf nodes of
the constructed tree. The terminal leaf nodes also store the entire cost of the path from the
topmost parent node to itself, which the planner uses to pick the most optimal sequence of
actions that achieve the player’s goal. Pseudocodes of the two functions that handle this part
of the algorithm are given below:
Listing 1 Function to build a tree structure of actions
bool BuildTree(Node parent, List<Node> pathList, Set<Action>
availableActions, State goal)
{
foundPath = false
foreach action in availableActions:
if action.mPrecondition in parent.mAction.mEffect:
currentState = UpdateState(action)
Node node = new Node(parent,
parentCost + givenCost,
action)
if currentState == goal:
pathList.Add(node)
foundPath = true
else:
availableActions.delete(action)
bool found = BuildTree(node, pathList,
availableAction,
goal)
if (found):
foundPath = true
return foundPath
}
Listing 2 Function to choose the optimal sequence of actions and return it as a plan
List<Action> CreatePlan(GameObject agent, List<Action>
availableActions, State worldState,
State goal)
{
foreach action in availableActions:
if action.mPrecondition == false:
availableActions.remove(action)
List<Node> pathList = new List<Node>()
Node start = new Node(null, 0, worldState, null)
bool success = BuildTree(start, pathList,
availableActions, goal)
// If no plan found, return null
7. 7
if not success:
return null;
Node optimal = null;
foreach terminalNode in pathList:
if optimal == null:
optimal = terminalNode
else:
if optimal.cost > terminalNode.cost:
optimal = terminalNode
List<Action> actionList = new List<Action>()
while optimal != null:
actionList.push_front(optimal.mAction)
optimal = optimal.mParent
return actionList
}
After a plan has been formulated by the planner, the plan is sent to the player, and
the player begins to execute the sequence of actions in the plan. Each execution of an action
by the player can either succeed or fail. Upon success, the player updates its parameters, and
proceeds to the next action in the sequence; upon failure, the player relays the information to
the planner, and the planner re-evaluates the world state, makes a new plan, and returns it to
the player. If at any time a planner fails to find a plan that is able to achieve the player’s
goal, the player does not receive a plan, and the player becomes idle until a change happens
in the state of the world that would allow the planner to come up with a working plan again.
4.1 Possible Improvements
The planner in our demo used a tree data structure to search for all possible paths to the goal
state of the world, and then chose the path with the lowest cost. The method used was
somewhat like a BFS search which did not stop upon finding a path to goal, and proceeded
to find all possible paths to goal in order to compare their costs and return the optimal one.
This method of searching for the optimal path is inefficient since we are interested more in
the total cost of the path instead of the number of nodes in the path; the BFS algorithm
excels in the latter case, and should not be used in the former case. Our search algorithm can
be improved by using smart search algorithms which guarantee an optimal path, such as A*
with an admissible heuristic.
5 Advantages and Disadvantages of GOAP
From what we have talked about so far, GOAP, or planners in general for that matter, seem
very promising. However, bigger games which base their AI architecture majorly on
planning systems are rare, even after Jeff Orkin’s presentation of the game F.E.A.R at GDC
8. 8
in 2006. In this section, we assess the advantages and disadvantages of GOAP, and analyze
why it has not taken over variations of FSMs and behavior trees as the major AI architecture
in the game industry.
The biggest advantage of GOAP over other architectures is that GOAP creates a
dynamic problem-solving environment which greatly contributes to making the AI behavior
in-game look more realistic. Compared to behavior trees or FSMs, the fact that GOAP uses
actions which have preconditions and effects that alter the world state allows agents under
instruction from the planner to behave in different ways to the same problem according to
the resources given to them and conditions imposed on them. Specifically, GOAP can
respond to unexpected changes in the world very well, since all it needs to do is come up
with a new plan; this is a clear advantage over behavior trees, which are known for their
notoriously bad handling of interruptions.
Another advantage of GOAP is that the designer can provide agents with all the
actions that he would like to see them perform without having to couple specific actions to
goals. All the designer needs to specify is whether an action needs any preconditions to be
met in order to be performed, and what effects the action would have on the world state once
it has been performed. Figure 4 illustrates how this open relationship between actions and
goals results in a much smaller complexity compared to those of other designs which would
be necessary to achieve the same end result.
Figure 4 In GOAP, the designer can provide actions and goals without having to
associate certain actions with certain goals, since the agent will figure out by itself which
actions are necessary to achieve certain goals.
9. 9
However, GOAP does have its batch of problems. The biggest problem associated
with using GOAP as the main architecture of a game’s AI system comes from the fact that it
is extremely hard to design correctly. Planning systems work best when designers want the
AI to achieve something, and do not particularly care about how it is done as long as it looks
realistic. However, when the designer is also interested in the process by which the agent
achieves its goal, planning systems suddenly become very difficult to work with. For
instance, when a designer wants an agent to display a certain behavior, he cannot do so by
explicitly telling the agent to demonstrate such behaviors. Instead, the designer must first
come up with a set of actions the agent can perform, along with preconditions and effects for
each action. Then, the designer must tune the parameters of the world, and then give a goal
to the planner so that the planner would think the sequence of behaviors desired by the
designer is the optimal choice to achieve the goal given to it by the designer.
Another problem of GOAP, which follows from the aforementioned problem, is that
when not done perfectly, it is simply better to use behavior trees to achieve the same
behavior. Although one of the biggest advantages of using GOAP is the realistic behavior its
agents display through dynamic problem-solving, if the design of the GOAP itself is not so
great, the output behavior is such that one can achieve a similar level of realisticity by
simply using behavior trees. Behavior trees have a huge advantage in performance over
planners since it does not rely on dynamic problem-solving at runtime. While planners need
to re-assess the world state each time it creates a plan or the world state changes
significantly enough for the planner to be interested, a behavior tree is processed at compile
time and already fired up and ready to go at runtime, and can output similarly realistic
behaviors without having to spend much resources at runtime. Hence, if a GOAP system is
not designed perfectly, it loses what advantages it has, and is only left with its downsides in
performance.
6 Conclusion
GOAP is a type of planning architecture which assigns a goal to an agent, provides it with
all available actions that the agent can perform, and then lets the agent figure out which
sequence of actions optimally achieves the goal given to it. When designed correctly, GOAP
can be a powerful AI architecture for a game, outputting realistic behavior across any setting
the designer might throw at the agents. However, if a designer is more interested in
displaying specific types of behaviors through its agents, it is extremely difficult to design a
perfect GOAP system to suit the designer’s needs, and most of the times the advantages of
an imperfect GOAP system is negligible compared to its disadvantages, and games prefer
behavior trees and various types of FSMs for their AI architecture. If your game values
realistic behavior of agents more than specific types of behaviors displayed by agents, a
GOAP system might be the right choice for your game, and you should give it a try.
10. 10
7 References
7.1 Online references
[AiGameDev]Alex J Champandard. Planning in games: An overview of lessons learned
http://aigamedev.com/open/review/planning-in-games/
[MIT media lab]Jeff Orkin . Goal oriented Action planning
http://alumni.media.mit.edu/~jorkin/goap.html