Chapter05

1"Dana"Nau"and"Vikas"Shivashankar:"Lecture"slides"for!Automated!Planning!and!Ac0ng" Updated"4/16/15"
This"work"is"licensed"under"a"CreaBve"Commons"AEribuBonGNonCommercialGShareAlike"4.0"InternaBonal"License."
Chapter(5((
Delibera.on(with(Nondeterminis.c(Domain(
Models(
Dana S. Nau and Vikas Shivashankar
University of Maryland

Introduc.on(
!  World seldom predictable
●  corresponding deliberation models as a result always going to be incomplete
!  Results in:
●  Action failures
●  Unexpected side effects of actions
●  Exogenous events
!  So far, been working with deterministic action models
●  Each action, when applied in a particular state, results in only one state
●  Formally: γ(s,a) returns a single state
●  Doesn’t adequately support inherent uncertainty in domains
!  Nondeterministic models provide more flexibility:
●  An action, when applied in a state, may result in one among several possible
states
●  γ(s,a) returns a set of states
!  Nondeterministic models allow modeling uncertainty in planning domains

Why(Model(Uncertainty?((
!  We’ve seen ways to handle these situations using deterministic models
●  Generate plans for the nominal case
●  Execute, and monitor
●  Detect failure, and recover

Why(Model(Uncertainty?(
Answer: nondeterministic models have several advantages
!  More accurate modeling
!  Plan for uncertainty ahead of time, instead of during execution
!  No nominal case in certain environments:
●  Think of throwing a dice/tossing a coin
●  Online payments where choice of payment left to user
!  However, comes at a cost:
●  More complicated, both conceptually and computationally
●  Since you need to take all different possibilities into account

164 Chapter 5
Figure 5.1: A simple nondeterministic planning domain model
Definition 5.1. (Planning Domain) A nondeterministic planning do-
main ⌃ is the tuple (S, A, ), where S is the finite set of states, A is the
finite set of actions, and : S ⇥ A ! 2S is the state transition function.
Search(Spaces(in(Nondeterminis.c(Planning(
!  Search space of deterministic planning
modeled as a graph
●  Nodes are states, edges are actions
!  For planning with nondeterministic domains,
search space no longer a graph
●  Instead its now an AND/OR graph
!  AND/OR graph has following elements:
●  OR branches: which action
to apply in a state?
●  AND branches: which state does the
action lead to?
!  Have control over which action to apply (OR
branches)
!  Don’t have control over resulting state (AND
branches)
A simple nondeterministic model of a
harbor management facility

164 Chapter 5
Nondeterminis.c(Planning(Domains(
!  3-tuple (S, A, γ)
●  S – finite set of states
●  A – finite set of actions
●  γ: S × A → 2S
!  Search space of a simple harbor
management domain
●  Only one state variable:
▸  pos(item)
●  Nodes
represent
possible values

164 Chapter 5
Ac.ons(in(Nondeterminis.c(Planning(Domains(
!  An action a applicable in state s iff γ(s,a) ≠ ∅
!  Applicable(s) is set of all actions applicable in s
●  Applicable(s) = {a ∈ A | γ(s, a) ≠ ∅}
!  Five actions in example
●  Two deterministic:
▸  unload, back
●  Three nondeterministic:
▸  park move, deliver

164 Chapter 5
Ac.ons(in(Nondeterminis.c(Planning(Domains(
!  park stores items in storage areas parking1 or
parking2
●  Nondeterminism used to model possibility of
▸  storing item in parking1
▸  storing item in parking2
▸  having to temporarily
move item in transit1
if space is unavailable
●  Once space is available: move action

164 Chapter 5
Plans(in(Nondeterminis.c(Domains(
!  Structure of plans must be different from
the deterministic case
●  Previously, sequence of actions
!  Doesn’t work here
●  Why?

164 Chapter 5
Plans(in(Nondeterminis.c(Domains(
!  Need the notion of a conditional plan
●  plans that account for various
possibilities in a given state
!  Can sense the actual action outcome
among the possible ones, and act
according to the conditional
structure of plan
!  A possible representation:
●  a policy:
partial function
that maps
states to actions
!  If a policy π maps a state s to an action a
●  that means we should perform a
whenever we are in state s

164 Chapter 5
Policies:(A(Representa.on(of(Plans(in(
Nondeterminis.c(Planning(
!  Example policy π1 for the harbor management
problem:
●  π1 (pos(item)=on"ship) = unload"
●  π1(pos(item)=at"harbor) = park"
●  π1(pos(item)=parking1) = deliver"

164 Chapter 5

164 Chapter 5!  In deterministic planning, can compute states
reachable by sequence of actions using γ
●  s ∪ γ (s, a1)∪ γ (γ (s,a1), a2) ∪ ...
!  Need few extra definitions to do similar
checks in nondeterministic planning
!  Reachable States: (s,π)
●  All states that can be produced by
starting at s and executing π
!  Example: (pos(item)=on"ship,π1)
Deﬁni.ons(Over(Policies(

164 Chapter 5
!  In deterministic planning, can compute states
reachable by sequence of actions using γ
●  s ∪ γ (s, a1)∪ γ (γ (s,a1), a2) ∪ ...
!  Need few extra definitions to do similar
checks in nondeterministic planning
!  Reachable States: (s,π)
●  All states that can be produced by
starting at s and executing π
!  Example: (pos(item)=on"ship,π1)

164 Chapter 5
!  Need to also check whether plan reaches goal
●  Requires calculating final states of policy
!  leaves (s,π): set of final states reached by
policy π starting from state s
!  leaves(s, π) = {s′ | s′ ∈ ︎ (s, π) and
s′ not in Dom(π)}
!  Example:
●  leaves (pos(item)=on"ship,"π1)

!  Reachability graph, Graph(s,π)
●  Graph of all possible state transitions if we
execute π starting at s
●  Graph(s,π) = { γ︎(s,π), E |
s′ ∈ γ︎(s, π), s′′ ∈ π(s′), and (s′,s′′) ∈ E}

●  π2"(pos(item)=on"ship)"="unload"
●  π2(pos(item)=at"harbor)"="park"
●  π2(pos(item)=parking1)"="deliver"
●  π2(pos(item)=parking2)"="back"
●  π2(pos(item)=transit1)"="move"
●  π2(pos(item)=transit2)"="move;""

●  π3"(pos(item)=on"ship)"="unload"
●  π3(pos(item)=at"harbor)"="park"
●  π3(pos(item)=transit3)"="move""

Planning(Problems(and(Solu.ons(
!  Let Σ = (S,A,γ) be a planning domain
!  A planning problem P is a 3-tuple P = (Σ,s0,Sg)
●  s0 ∈ S is the initial state
●  Sg ⊆ S is set of goal states
!  Note: previous book had set of initial states S0
●  Allowed uncertainty about initial state
●  Current definition is equivalent
▸  Can easily translate one to the other
•  How?

Planning(Problems(and(Solu.ons(
!  Let Σ = (S,A,γ) be a planning domain
!  A planning problem P is a 3-tuple P = (Σ,s0,Sg)
●  s0 ∈ S is the initial state
●  Sg ⊆ S is set of goal states
!  Note: previous book had set of initial states S0
●  Allowed uncertainty about initial state
●  Current definition is equivalent
▸  Can easily translate one to the other
•  How?
▸  Introduce a new start action such that γ (s0, start) = S0
!  Solutions: not as straightforward to define as Deterministic Planning
●  Based on actual action outcomes, might or might not achieve goal
●  Can define different criteria of success – many types of solutions

Type(1:(Solu.on(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is a solution iff
leaves (s0,π) ∩ Sg ≠ ∅
!  A policy that may lead to a goal
●  In other words: at least one sequence of
nondeterministic outcomes leads to a goal state
!  Example:
●  s0 = {pos(item)"="on_ship}
●  Sg = {pos(item)"="gate1,"pos(item)"="gate2}
!  Policy π1 is a solution
!  Reason: At least one of the paths in
reachability graph of π1 leads to a state in Sg

Type(2:(Safe(Solu.on(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is a safe solution
iff
∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅)
Safe solution: a solution
in which a goal state is
reachable from every state
in the reachability graph
!  Is π1 a safe solution?
Condition for solutionNeeds to hold for
all reachable
states

iff
∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅)
●  No
all reachable
states

iff
∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅)
all reachable
states

iff
∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅)
●  Yes
all reachable
states

Type(2a:(Cyclic(Safe(Solu.ons(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is a cyclic safe
solution iff
(1)  leaves(s0, π) ⊆ Sg ∧
(2)  (∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅))
(3)  Graph(s0, π) is cyclic
Meaning of Conditions:
(1)  No non-solution leaves
(2)  Safe solution
(3)  Reachability graph is cyclic
Cyclic Safe solution: a
safe solution with cycles
!  π2 is a cyclic safe solution
How does having cycles affect level of safety?

Type(2a:(Cyclic(Safe(Solu.ons(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is a cyclic safe
solution iff
(1)  leaves(s0, π) ⊆ Sg ∧
(2)  (∀s ∈ γ ︎(s0, π)(leaves(s, π) ∩ Sg ≠ ∅))
(3)  Graph(s0, π) is cyclic
(2)  Safe solution
(3)  Reachability graph is cyclic
Cyclic Safe solution: a
safe solution with cycles
!  π2 is a cyclic safe solution
How does having cycles affect level of safety?
!  could go though cycle infinitely many times
!  If execution gets out of loop eventually,
guaranteed to reach goal state

Type(2b:(Acyclic(Safe(Solu.ons(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is a acyclic safe
solution iff
(1) leaves(s0, π) ⊆ Sg ∧
(2) Graph(s0, π) is cyclic
(2)  Reachability graph is acyclic
Acyclic Safe Solution: a
safe solution without cycles
!  π3 is an acyclic safe solution
!  Acyclic policy completely safe
●  No matter what happens, guaranteed to
eventually reach the goal

Unsafe(Solu.ons(
Let P = (Σ,s0,Sg) be a planning problem. Let π be a policy for Σ. π is an unsafe
solution iff
(1)  (leaves(s0, π) ∩ Sg ≠ ∅)
(2)  ((∃s ∈ leaves(s0, π) | s is not in Sg) ∨ (∃s ∈ γ︎(s0,π) | leaves(s,π)=∅))
Either there is a non-solution
leaf state
Or you get caught in 
an inﬁnite loop
Both of these are bad events

Summary(of(Solu.on(Types(
Section 5.3 173
Figure 5.6: Di↵erent Kinds of Solutions: A Class Diagram
nondeterminism probabilistic
solutions weak solutions -
unsafe solutions - improper solutions
safe solutions strong cyclic solutions proper solutions
cyclic safe solutions - -
acyclic safe solutions strong solutions -
!  Unsafe Solutions aren’t of much interest to us
●  Do not guarantee achievement of goal
!  Acyclic Safe Solutions are the best – complete assurance that we’ll get to the goal
!  Cyclic Safe Solutions also good, but provide a weaker degree of assurance
●  We can get into loops
●  However, assuming that we don’t stay in the loop forever, guaranteed to
achieve the goal

SOLVING(NONDETERMINISTIC(
PLANNING(PROBLEMS(

AND/OR(Graph(Search(Algorithms(
!  Nondeterministic planning search
spaces represented as AND/OR
graphs
●  nodes: states
●  OR branches: actions applicable
in a state (consider 1)
●  AND branches: successor states
from an state-action pair
(consider ALL)
!  Reachability graph of a solution
policy includes one action at each OR
branch and all of the action’s
outcomes at each AND branch
!  First set of planning algorithms will
do AND/OR graph search
●  Simple extensions of ForwardG
Search"from Chapter 2
ship
hbr
par1
tr1
par2
park
tr2
g2
g1
del
tr3
g1
hbr
del
back
par1
par2
move
unload

FindRSolu.on:(Algorithm(to(ﬁnd(Solu.ons(
Chapter 2
Forward-search (⌃, s0, g)
s s0; ⇡ hi
loop
if s0 satisﬁes g then return ⇡
A0 {a 2 A | a is applicable in s}
if A0 = ? then return failure
nondeterministically choose a 2 A0
s (s, a); ⇡ ⇡.a
A nondeterministic forward-search planning algorithm.
iscuss properties that are shared by all algorithms that do a
of the same search space, even though those algorithms may
es of that tree in di↵erent orders. The rest of this section
of those algorithms.
olution to a planning problem may require a huge computa-
r an arbitrary CSV planning problem the task is PSPACE-
]. To reduce the computational e↵ort, several of the search
his section incorporate heuristic techniques for selecting which
174 Chapter 5
Find-Solution (⌃, s0, Sg)
⇡ ?; s s0; Visited {s0} // initialization
loop
if s 2 Sg then return ⇡ // goal test
A0 Applicable(s)
if A0 = ? then return failure // dead-end test
nondeterministically choose a 2 A0 // branching
nondeterministically choose s0 2 (s, a)// progression
if s0 2 Visited then return failure // loop check
⇡(s) a; Visited Visited [ {s0}; s s0
174 Chapter 5
⇡ ?; s s0; Visited {s0} // initialization
loop
if s 2 Sg then return ⇡ // goal test
A0 Applicable(s)
if A0 = ? then return failure // dead-end test
nondeterministically choose a 2 A0 // branching
nondeterministically choose s0 2 (s, a)// progression
if s0 2 Visited then return failure // loop check
Additional nondeterministic
choice to decide which action 
outcome to plan for next
Cycle-checking
Identical Algorithms except:
Deterministic Planning algorithm 
from Chapter 2
Nondeterministic Planning 
algorithm

174
⇡ ?; s s0; Visited {s0} // initia
loop
if s 2 Sg then return ⇡ // goal
A0 Applicable(s)
if A0 = ? then return failure // dead-
nondeterministically choose a 2 A0 // branc
nondeterministically choose s0 2 (s, a)// progr
if s0 2 Visited then return failure // loop
Figure 5.7: Planning for Solutions by For
graphs to find solutions. The main goal of the foll
the di↵erence in algorithms from deterministic dom
mainly a didactic rather than practical objective.
5.3.1 Planning for Solutions by Forward
174
⇡ ?; s s0; Visited {s0} // initial
loop
if s 2 Sg then return ⇡ // goal t
A0 Applicable(s)
if A0 = ? then return failure // dead-e
nondeterministically choose s0 2 (s, a)// progre
if s0 2 Visited then return failure // loop c
graphs to find solutions. The main goal of the follo
the di↵erence in algorithms from deterministic doma
We first present a very simple algorithm that finds
ship
164 Chapter 5
Policy:

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
unload
164 Chapter 5
Policy:
ship: unload

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
par1
tr1
par2
park
unload
164 Chapter 5
Policy:
ship: unload
hbr: park
Assume this 
outcome is
chosen

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
par1
tr1
par2
park
unload
164 Chapter 5
Policy:
ship: unload
hbr: park
par1: deliver
g1
g2
del
tr2
Assume this 
outcome is
chosen

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
par1
tr1
par2
park
unload
164 Chapter 5
Policy:
ship: unload
hbr: park
par1: deliver
tr2: move
g1
g2
del
tr2
g1 g2
move
Assume this 
outcome is
chosen

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
par1
tr1
par2
park
unload
164 Chapter 5
Policy:
ship: unload
hbr: park
par1: deliver
tr2: move
g1
g2
del
tr2
g1 g2
move
Reached a 
goal state. 
Terminate here.

174
loop
A0 Applicable(s)
174
loop
A0 Applicable(s)
ship
hbr
par1
tr1
par2
park
unload
164 Chapter 5
Policy:
ship: unload
hbr: park
par1: deliver
tr2: move
g1
g2
del
tr2
g1 g2
move
This policy 
is returned

FindRSolu.on:(Proper.es(
!  Finds a solution if one exists
!  However, in most cases it will find unsafe solutions
●  Because it only considers one outcome for each action
!  Nondeterministic choice implemented using backtracking
●  Two levels of backtracking
▸  Choosing an action
▸  Choosing an effect of that action
●  Each sequence of choices corresponds to an execution trace of FindGSoluBon"

FindRSafeRSolu.on(
Section 5.3 175
Find-Safe-Solution (⌃, s0, Sg)
⇡ ?
Frontier {s0}
while Frontier 6= ? do
if Frontier ✓ Sg then return ⇡ // goal reached by all leaves
for every s 2 Frontier do
remove s from Frontier
if Find-Solution(⌃, s, Sg) = failure // nonterminating loop
then return failure
nondeterministically choose a 2 Applicable(s) // select an action
⇡ ⇡ [ (s, a)
Frontier Frontier [ ( (s, a) Dom(⇡)) // expand
return failure
Figure 5.8: Planning for Safe Solutions by Forward-search.
Keeps track of 
unexpanded states, 
much like A*
Uses FindGSoluBon"to see 
if a Solution exists. If no 
Solution, then no 
Safe-Solution.
Only nondeterministic choice is action. 
Adds ALL possible successor states to 
Frontier. Not a choice since Safe-Solution 
needs to guard against all eventualities.

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
if Frontier ✓ Sg then return ⇡ // goal
if Find-Solution(⌃, s, Sg) = failure //
then return failure
nondeterministically choose a 2 Applicable(s)
⇡ ⇡ [ (s, a)
Frontier Frontier [ ( (s, a) Dom(⇡))
return failure
Figure 5.8: Planning for Safe Solutions by For
resulting from applying a to s. The interpretation of
choice of the state among the elements of the frontie
creates several copies of a, one for each applicable act
these copies has been made, the algorithm makes ano
ship
Policy:
Frontier: ship

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
ship
hbr
unload
Policy:
ship: unload
Frontier: hbr

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: par2, 
tr1,par1
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park
Unlike FindGSoluBon, need to 
solve for all successor states. 
All are added to Frontier.

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: par2, 
tr1,g1,g2,tr2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver
g1
g2
del
tr2

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: par2, 
tr1,g1,g2,tr2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver
g1
g2
del
tr2
g1 and g2 are goal states.So  
FSS doesn’t solve for it further.

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: par2, 
tr1,g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move
g1
g2
del
tr2
g1 g2
move

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier:tr1, 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: back
g1
g2
del
tr2
g1 g2
move
hbr
back

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: back 
tr1: move
g1
g2
del
tr2
g1 g2
move
par1 par2
hbr
back

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Frontier: 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: back 
tr1: move
g1
g2
del
tr2
g1 g2
move
par1 par2
satisﬁes
hbr
back

164 Chapter 5
FindRSafeRSolu.on(
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: back 
tr1: move
g1
g2
del
tr2
g1 g2
move
par1 par2
This policy 
is returned
hbr
back

Proper.es(of(FindRSafeRSolu.on(
!  Guaranteed to find safe solution, if one exists
!  Uses FindGSoluBon"as a subroutine to detect nonterminating loops

FindRAcyclicRSolu.on(
176 Chapter
Find-Acyclic-Solution (⌃, s0, Sg)
⇡ ?
Frontier {s0}
if Frontier ✓ Sg then return ⇡ // goal reached by all leave
if Frontier Dom(⇡) 6= ? // loop checking
then return failure
choose nondeterministically a 2 Applicable(s) // select an action
⇡ ⇡ [ (s, a)
Frontier Frontier [ (s, a) // expand
return failure
Figure 5.9: Planning for Safe Acyclic Solutions by Forward-search.
Cycle check: makes sure 
that action applied in previous 
iteration didn’t lead to a state 
already considered by π
Similar to 
FindRSafeRSolu.on except:

164 Chapter 5
FindRAcyclicRSoln(
ship
Policy:
Frontier: ship
⇡ ?
Frontier {s0}
if Frontier Dom(⇡) 6= ? //
then return failure
choose nondeterministically a 2 Applicable(s)
⇡ ⇡ [ (s, a)
Frontier Frontier [ (s, a)
return failure
Figure 5.9: Planning for Safe Acyclic Solutions by
While exploring the frontier, it calls Find-Soluti
whether the current policy contains cycles without p
tion, i.e., whether it gets in a state where no action i
there is no path to the goal. Also Find-Safe-Solution
terministic selection among the applicable actions.

164 Chapter 5
FindRAcyclicRSoln(
ship
hbr
unload
Policy:
ship: unload
Frontier: hbr
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: par2, 
tr1,par1
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park
Unlike FindGSoluBon, need to 
solve for all successor states. 
All are added to Frontier.
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: par2, 
tr1,g1,g2,tr2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver
g1
g2
del
tr2
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: par2, 
tr1,g1,g2,tr2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver
g1
g2
del
tr2
g1 and g2 are goal states.So  
FSS doesn’t solve for it further.
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: par2, 
tr1,g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move
g1
g2
del
tr2
g1 g2
move
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier:tr1, 
g1,g2,tr3
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: deliver
g1
g2
del
tr2
g1 g2
move
tr3
del
g1
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure
Note: doesn’t 
consider back(
because it  
creates 
a cycle

164 Chapter 5
FindRAcyclicRSoln(
Frontier:tr1, 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: deliver
tr3: move
g1
g2
del
tr2
g1 g2
move
tr3
del
g1
g2
move
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: deliver
tr3: move 
par1: move
g1
g2
del
tr2
g1 g2
move
tr3
del
g1
g2
move
par1 par2
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
Frontier: 
g1,g2
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: deliver
tr3: move 
tr1: move
g1
g2
del
tr2
g1 g2
move
tr3
del
g1
g2
move
par1 par2
satisﬁes
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

164 Chapter 5
FindRAcyclicRSoln(
ship
hbr
par1
tr1
par2
park
unload
Policy:
ship: unload
hbr: park 
par1: deliver 
tr2: move 
par2: deliver
tr3: move 
tr1: move
g1
g2
del
tr2
g1 g2
move
tr3
del
g1
g2
move
par1 par2
This policy 
is returned
⇡ ?
Frontier {s0}
then return failure
⇡ ⇡ [ (s, a)
return failure

Proper.es(of(FindRAcyclicRSolu.on(
!  Guarantees finding Acyclic Safe Solutions, if one exists
!  Checks for cycles by seeing if any node in FronBer"is already in the domain of π

Guided(Planning(For(Safe(Solu.ons(
!  Main motivation: finding possibly unsafe solutions much easier than finding safe
solutions
●  FindGSoluBon"ignores AND/OR graph structure and just looks for a policy that
might achieve the goal
●  FindGSafeGSoluBon needs to plan for all possible outcomes of actions
!  We’ll now see an algorithm that computes safe solutions by starting from possibly
unsafe solutions

GuidedRFindRSafeRSolu.on(
192 Chapter 5
Guided-Find-Safe-Solution (⌃,s0,Sg)
if s0 2 Sg then return(?)
if Applicable(s0) = ? then return(failure)
⇡ ?
loop
Q leaves(s0, ⇡) Sg
if Q = ? then return(⇡)
select arbitrarily s 2 Q
⇡0 Find-Solution(⌃, s, Sg)
if ⇡0 6= failure then do
⇡ ⇡ [ {(s, a) 2 ⇡0 | s 62 Dom(⇡)}
else for every s0 and a such that s 2 (s0, a) do
⇡ ⇡ {(s0, a)}
make a not applicable in s0
Figure 5.17: Guided Planning for a Safe Solution
Look at all the leaves of π.  
Safe solution requires a goal state 
to be reachable from every node. 
So plan from each non-solution leaf.
Incorporate solution π’ found 
into overall policy π
If solution not found from
s, goals unreachable from
s. Remove all elements of
π that could result in s.

164 Chapter 5
Figure 5.1: A simple nondeterministic planning domain model
EXAMPLE(

Finding(Safe(Solu.ons(by(Determiniza.on(
!  Main idea underlying GuidedGFindGSafeGSoluBon:"
●  Can use (possibly) unsafe solutions (using FindGSoluBon) to guide the search
towards a safe solution
!  Advantageous because we can temporarily focus on only one of the action’s
outcomes
●  Searching for paths rather than trees
!  Determinization carries same idea even further
!  I’ll explain how determinization works, and then how it compares with FindG
SoluBon"

Determiniza.on(Techniques(
!  High-Level Approach:
●  Transform nondeterministic model to a
deterministic one
▸  Each nondeterministic action translates to
several deterministic actions, one for each
possible successor state
●  Use CSV planners to solve these problems
●  Stitch solutions together into a policy
!  Advantages:
●  Deterministic planning problems efficiently
solvable
●  Allows us to leverage all of the nice features
CSV planners bring in
▸  Heuristics, landmarks, etc
hbr
par1
tr1
par2
park
hbr
par1
tr1
par2
park1
park2
park3

FindRSafeRSolu.onRbyRDeterminiza.on(
Find-Safe-Solution-by-Determinization (⌃,s0,Sg)
if s0 2 Sg then return(?)
if Applicable(s0) = ? then return(failure)
⇡ ?
⌃d mk-deterministic(⌃) // determinization
loop
Q leaves(s0, ⇡) Sg
if Q = ? then do
⇡ ⇡ {(s, a) 2 ⇡ | s 62 b(s0, ⇡)} // clean policy
return(⇡)
select s 2 Q
p0 Forward-search (⌃d, s, Sg) // classical planner
if p0 6= fail then do
⇡0 Plan2policy(p0, s) // plan2policy transformatio
⇡ ⇡ [ {(s, a) 2 ⇡0 | s 62 Dom(⇡)}
⇡ ⇡ {(s0, a)}
make the actions in the determinization of a // action elimination
not applicable in s0
Compute determinization of domain
If no non-solution leaf
states, we’re done. Need to
clean up policy to remove
unreachable states
Invoke CSV planner on
deterministic model
Transform deterministic 
plan into policy
Action elimination

Plan2Policy(
⌃d rather than the nondeterministic domain ⌃.
Plan2policy(p = ha1, . . . , ani,s)
⇡ ?
loop for i from 1 to n do
⇡ ⇡ [ (s, ai)
s d(s, ai)
return ⇡
Figure 5.19: Transformation of a sequential plan into a corresponding pol
5.6 Online approaches with nondeterminist
models
In Chapter 1 (see Section 1.2, and speciﬁcally Section 1.6.2) we introdu
the idea of interleaving planning and acting. One motivation is that, giv
a complete plan that is generated o↵-line, its execution seldom works
Relatively straightforward: transforms
solution into a policy representation
Note: p needs to be an acyclic plan
To ensure this, Forward-Search (see
previous slide) needs to return an
acyclic plan

Ac.on(Elimina.on(
if p0 6= fail then do
⇡0 Plan2policy(p0, s) // plan2poli
⇡ ⇡ [ {(s, a) 2 ⇡0 | s 62 Dom(⇡)}
⇡ ⇡ {(s0, a)}
make the actions in the determinization of a // action eli
not applicable in s0
Figure 5.18: Planning for Safe Solutions by Determinization
Fragment of FindGSafeGSoluBonGbyGDeterminizaBon 
that has to do with action elimination
Triggered if no deterministic solution from s 
Informally it does the following:
•  Update π to ensure s is never reached
•  Ensure that no deterministic solution found in a future call to ForwardG
Search"returns a solution going through s

Proper.es(of(FindRSafeRSolu.onRbyR
Determiniza.on(
!  Finds safe solutions
!  Any CSV planner can be plugged in
!  Determinization needs to be done carefully
●  Could potentially lead to an exponential blowup in the number of actions

Online(Approaches(with(Nondeterminis.c(Models(
!  Interleaving planning and acting is
important
●  Planning models are approximate –
execution seldom works out as planned
●  Large problems mean long planning
time – need to interleave the two
!  This motivation even more stronger in
nondeterministic domains
●  Long time needed to generate safe
solutions when there are lots of state
variables, actions etc
!  Therefore interleaving planning and acting
helps reduce complexity
●  Instead of coming up with complete
policy, generate partial policy that tells
us the next few actions to perform
196
Figure 5.20: O↵-line vs. Run Time Search Spaces
acting and planning then we reduce significantly the sear
indeed to find a partial policy, e.g., the next few ”good”
or some of them, and repeat these two interleaved plannin
Offline vs Runtime 
Search Spaces

Issues(With(Interleaving(Planning(and(Ac.ng(
!  Need to identify good actions without exploring entire search space
●  Can be done using heuristic estimates
!  Handling Dead-ends:
●  When lookahead is not enough, can get trapped in dead ends
▸  By planning fully, we would have found out about the dead-end
▸  E.g. if robot goes down a steep incline out of which it cannot come back
up
●  Not a problem in safely explorable domains
▸  Goal states reachable from all situations
!  Despite these issues, interleaving planning and acting an essential alternative to
purely offline planning

Ac.ng(Procedure:(RunRLookahead(
198 Chapter
Run-Lookahead(⌃, s0, Sg)
s s0
while s /2 Sg and Applicable(s) 6= ? do
⇡ Lookahead(s, ✓)
apply partial plan ⇡
s observe current state
Figure 5.21: Interleaving planning and execution by look-ahead
There are di↵erent ways in which the generated plan can be partia
and di↵erent ways in planning and acting can be interleaved. Indeed th
procedure Run-Lookahead is parametric along two dimensions:
The ﬁrst parametric dimension is in the call to the look-ahead plannin
step, i.e., Lookahead(s, ✓). The parameter ✓ determines the way in which th
generated plan ⇡ is partial. For instance, it can be partial since the lookahea
is bounded, i.e., the forward search is performed for a bounded number o
This is where the planner is
invoked. θ is a context-dependent
parameter that restricts the search
for a solution and hence
determines how π is partial
•  θ could be a bound on the
search depth
•  θ could be limitation on
planning time
•  θ could also limit the number of
action outcomes considered
•  Special case: only ONE
outcome == FindGSoluBon(
!  Two ways to perform lookahead:
●  Lookahead with a bounded
number of steps: handle all
action outcomes, but only upto a
certain depth
●  Lookahead by
determinization: solve the
problem fully, but possibly
unsafe due to determinization

FFRReplan:(Lookahead(by(Determiniza.on(
Section 5.6
FF-Replan (⌃, s, Sg)
while s /2 Sg and Applicable(s) 6= ? do
if ⇡d undeﬁned for s then do
⇡d Forward-search (⌃d, s, Sg)
apply action ⇡d(s)
s observe resulting state
Figure 5.22: Online determinization planning and acting algorithm.
lookahead and partial numebr of outcomes, in any arbitrary way.
The second parametric dimension is in the application of the partial p
that has been generated, i.e., apply the partial plan ⇡. Independently of
lookahead, we can still execute ⇡ in a partial way. Suppose for instance t
we have generated a sequential plan of length n, we can decide to ap
m  n steps.
Run Forward-Search on 
a determinized version of 
the problem.
Then start executing 
the (possibly unsafe) policy 
until we cannot execute  
it anymore
Properties:
•  If the domain is safely-explorable, 
then FFGReplan will get to a goal state.
•  If the domain has dead-ends, then 
no guarantees.

Chapter05

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

More from Tianlu Wang

More from Tianlu Wang (20)

Chapter05