SlideShare a Scribd company logo
1 of 35
Download to read offline
1	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
This	
  work	
  is	
  licensed	
  under	
  a	
  CreaAve	
  Commons	
  ADribuAon-­‐NonCommercial-­‐ShareAlike	
  4.0	
  InternaAonal	
  License.	
  
Chapter	
  6	
  	
  
Delibera.on	
  with	
  Probabilis.c	
  Domain	
  Models	
  
Dana S. Nau and Vikas Shivashankar
University of Maryland
2	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Probabilis.c	
  Planning	
  Domain	
  
●  Actions have multiple possible outcomes
Ø  Each outcome has a probability
●  Several possible action representations
Ø  Bayes nets, probabilistic operators, …
●  Book doesn’t commit to any representation
Ø  Only deals with the underlying semantics
●  Σ = (S,A,γ,Pr,cost)
Ø  S = set of states
Ø  A = set of actions
Ø  γ : S × A → 2S
Ø  Pr(s′ | s, a) = probability of going to state s′ if we perform a in s
•  Require Pr(s′ | s, a) > 0 for every s′ in γ(s,a)
Ø  cost: S × A → +
•  cost(s,a) = cost of action a in state s
3	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Nota.on	
  from	
  Chapter	
  5	
  
●  Policy π : Sπ → A
Ø  Sπ ⊆ S
Ø  ∀s ∈ Sπ , π(s) ∈ Applicable(s)
●  γ︎(s,π) = {s and all descendants of s reachable by π}
Ø  the transitive closure of γ with π
●  Graph(s,π) = rooted graph induced by π
Ø  {nodes} = γ︎(s,π)
Ø  {edges} = ∪a∈Applicable(s){(s,s′) | s′ ∈ γ(s,a)}
Ø  root = s
●  leaves(s,π) = {states in γ︎(s,π) that aren’t in Sπ}
4	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Stochas.c	
  Systems	
  
●  Stochastic shortest path (SSP) problem: a triple (Σ, s0, Sg)
●  Solution for (Σ, s0, Sg):
Ø  policy π such that s0 ∈ Sπ and leaves(s0,π) ⋂ Sg ≠ ∅
●  π is closed if π doesn’t stop at non-goal states unless no action is applicable
Ø  for every state in γ︎(s,π), either
•  s ∈ Sπ (i.e., π specifies an action at s)
•  s ∈ Sg (i.e., s is a goal state)
•  Applicable(s) = ∅ (i.e., there are no applicable actions at s)
5	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
●  Robot r1 starts
at location l1	
  
Ø  s0 = s1 in
the diagram
●  Objective is to
get r1 to location l4	
  
Ø  Sg = {s4}
●  π1 = {(s1,	
  move(r1,l1,l2)),	
  (s2,	
  move(r1,l2,l3)),	
  (s3,	
  move(r1,l3,l4))}
Ø  Solution?
Ø  Closed?	
  
move(r1,l2,l1)	
  
2	
  
Policies	
  
Goal	
  
Start	
  
6	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
●  History: a sequence of
states, starting at s0
σ = 〈s0, s1, s2, s3, …, sh〉
or (not in book):
σ = 〈s0, s1, s2, s3, …〉
●  Let H(π) = {all histories
that can be produced by
following π from s0 to a
state in leaves(s0, π)}
●  If σ ∈ H(π) then Pr (σ | π) = ∏i ≥ 0 Pr (si+1 | si ,π(si))
Ø  Thus ∑σ ∈ Hπ Pr (σ | π) = 1
●  Probability that π will stop at a goal state:
Ø  Pr (Sg | π) = ∑ {Pr (σ | π) | σ ∈ H(π) and σ ends at a state in Sg}
move(r1,l2,l1)	
  
2	
  
Histories	
  
Goal	
  
Start	
  
7	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Unsafe	
  Solu.ons	
  
●  A solution π is unsafe if
Ø  0 < Pr (Sg | π) < 1
●  Example:
π1 = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  	
  	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  	
  	
  	
  (s3,	
  move(r1,l3,l4))}
●  H(π1) contains two histories:
Ø  σ1 = 〈s1,	
  s2,	
  s3,	
  s4〉 	
  Pr (σ1 | π1) = 1 × .8 × 1 = 0.8
Ø  σ2 = 〈s1,	
  s2,	
  s5〉 	
  Pr (σ2 | π1) = 1 × .2 = 0.2
●  Pr (Sg | π) = 0.8
Explicit	
  dead	
  end	
  
move(r1,l2,l1)	
  
2	
  
Goal	
  
Start	
  
8	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Unsafe	
  Solu.ons	
  
●  A solution π is unsafe if
Ø  0 < Pr (Sg | π) < 1
●  Example:
π2 = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  	
  	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  	
  	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  	
  	
  	
  (s5,	
  move(r1,l5,l6)),	
  
	
  	
  	
  	
  	
  (s6,	
  move(r1,l6,l5))}
●  H(π2) contains two histories:
Ø  σ1 = 〈s1,	
  s2,	
  s3,	
  s4〉 	
  Pr (σ1 | π2) = 1 × .8 × 1 = 0.8
Ø  σ3 = 〈s1,	
  s2,	
  s5,	
  s6,	
  s5,	
  s6,	
  …	
  〉 	
  Pr (σ3 | π2) = 1 × .2 × 1 × 1 × 1 × … = 0.2
●  Pr (Sg | π2) = 0.8
Implicit	
  dead	
  end	
  
move(r1,l2,l1)	
  
2	
   wait	
  
s6	
  
at(r1,l6)	
  
Goal	
  
Start	
  
9	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
●  A solution π is safe if
Ø  Pr (Sg | π) = 1
●  An acyclic safe solution:
π3 = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  	
  	
  	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  	
  	
  	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  	
  	
  	
  	
  (s5,	
  move(r1,l5,l4))}
●  H(π3) contains two histories:
Ø  σ1 = 〈s1,	
  s2,	
  s3,	
  s4〉 	
  Pr (σ1 | π3) = 1 × .8 × 1 = 0.8
Ø  σ4 = 〈s1,	
  s2,	
  s5,	
  s4〉 	
  Pr (σ4 | π3) = 1 × .2 × 1 = 0.2
●  Pr (Sg | π3) = 0.8 + 0.2 = 1
move(r1,l2,l1)	
  
2	
  
Safe	
  Solu.ons	
  
Goal	
  
Start	
  
10	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
●  A solution π is safe if
Ø  Pr (Sg | π) = 1
●  A cyclic safe solution:
π4 = {(s5,	
  move(r1,l5,l4)}
●  H(π4) contains infinitely
many histories:	
  
Ø  σ5 = 〈s1,	
  s4	
  〉 	
  Pr (σ5 | π4) = 0.5
Ø  σ6 = 〈s1,	
  s1,	
  s4〉 	
  Pr (σ6 | π4) = 0.5 × 0.5 = 0.25
Ø  σ7 = 〈s1,	
  s1,	
  s1,	
  s4〉 	
  Pr (σ7 | π4) = 0.5 × 0.5 × 0.5 = 0.125
• • •
Ø  σ∞ = 〈s1,	
  s1,	
  s1,	
  s1,	
  …	
  〉 	
  Pr (σ∞ | π4) = 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × … = 0
move(r1,l2,l1)	
  
2	
  
Safe	
  Solu.ons	
  
Pr (Sg | π4) = .5 + .25 + .125 + … = 1
Goal	
  
Start	
  
11	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
●  Example:
π = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  	
  	
  (s4,	
  move(r1,l4,l1)),	
  
	
  	
  	
  	
  (s5,	
  move(r1,l5,l1))}
●  What is Pr (Sg | π)?
Goal	
  
move(r1,l2,l1)	
  
2	
  
Example	
  
Start	
  
12	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
r	
  =	
  –100	
  
Expected	
  Cost	
  
Goal	
  
Start	
  
●  cost(s,a) = cost of using a in s
●  Example:
Ø  cost(s,a) = 1 for each
“horizontal” action
Ø  cost(s,a) = 100 for each
“vertical” action
●  Cost of a history:
Ø  Let σ = 〈s0, s1, … 〉 ∈ H(π)
Ø  cost(σ | π) = ∑i ≥ 0 cost(si,π(si))
●  Let π be a safe solution
●  Expected cost of following π to a goal:
Ø  Vπ(s) = 0 if s is a goal
Ø  Vπ(s) = cost(s,π(s)) + ∑ ︎s′∈γ(s,π(s)) Pr (sʹ′|s, π(s)) Vπ(sʹ′) otherwise
●  If s = s0 then
Ø  Vπ(s0) = ∑σ ∈ H(π) cost(σ | π) Pr(σ | s0, π)
s
s1
s2
sn
…
π(s)
13	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
r	
  =	
  –100	
  
Example	
  
Goal	
  
Start	
  
π3 = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  	
  	
  (s5,	
  move(r1,l5,l4))}
	
  
H(π1) contains two histories:
σ1 = 〈s1,	
  s2,	
  s3,	
  s4〉 	
  Pr (σ1 | π3) = 0.8 cost(σ1 | π3) = 100 + 1 + 100 = 201	
  
σ2 = 〈s1,	
  s2,	
  s5,	
  s4〉 	
  Pr (σ2 | π3) = 0.2 cost(σ2 | π3) = 100 + 1 + 100 = 201	
  
	
  
Vπ1(s1) = 201×0.8 + 201×0.2 = 201
14	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
π4 = {(s5,	
  move(r1,l5,l4)}
●  H(π4) contains infinitely
many histories:	
  
Ø  σ5 = 〈s1,	
  s4	
  〉 	
  Pr (σ5 | π4) = 0.5 cost (σ5 | π4) = 1
Ø  σ6 = 〈s1,	
  s1,	
  s4〉 	
  Pr (σ6 | π4) = 0.25 cost (σ6 | π4) = 2
Ø  σ7 = 〈s1,	
  s1,	
  s1,	
  s4〉 	
  Pr (σ7 | π4) = 0.125 cost (σ7 | π4) = 3
• • •
●  Vπ4(s1) = 1×.5 + 2×.25 + 3×.125 + 4×.0625 + … = 2
Safe	
  Solu.ons	
  
Goal	
  
Start	
  
15	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Planning	
  as	
  Op.miza.on	
  
●  Let π and π′ be safe solutions
●  π dominates π′ if Vπ(s) ≤ Vπ′(s) for every state where both π and π′ are defined
Ø  i.e., Vπ(s) ≤ Vπ′(s) for every s in Sπ ∩ Sπ′
●  π is optimal if π dominates every safe solution π′
●  V*(s) = min{Vπ(s) | π is a safe solution for which π(s) is defined}
= expected cost of getting from s to a goal using an
optimal safe solution
●  Optimality principle (also called Bellman’s theorem):
Ø  V*(s) = 0, if s is a goal
Ø  V*(s) = mina∈Applicable(s){cost(s,a) + ∑ ︎s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′)}, otherwise
s
s1
s2
sn
…
π(s)
16	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Policy	
  Itera.on	
  
●  Let (Σ,s0,Sg) be a safe SSP (i.e., Sg is reachable from every state)
●  Let π be a safe solution that is defined at every state in S
●  Let s be a state, and let a ∈ Applicable(s)
Ø  Cost-to-go: expected cost at s if we start with a, and use π afterward
Ø  Qπ(s,a) = cost(s,a) + ∑s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′)
●  For every s, let π′(s) ∈ argmina∈Applicable(s) Qπ(s,a)
Ø  Then π′ is a safe solution and dominates π
●  PI(Σ,s0,Sg,π0)	
  
	
  	
  	
  	
  π ← π0
loop
compute Vπ (n equations and n unkowns, where n = |S|)
for every non-goal state s do
π′(s) ← any action in argmina∈Applicable(s) Qπ(s,a)
if π′ = π then return π
π ← π′
●  Converges in a finite number of iterations
s
s1
s2
sn
…
π(s)
Tie-breaking rule: if
π(s) ∈ argmina∈Applicable(s) Qπ(s,a),
then use π′(s) = π(s)
17	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
r	
  =	
  –100	
  
Start with
π0 = {(s1,	
  move(r1,l1,l2)),	
  
	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  (s5,	
  move(r1,l5,l4))}
Example	
  
Goal	
  
Start	
  
Vπ(s4) = 0
Vπ(s3) = 100 + Vπ(s4) = 100
Vπ(s5) = 100 + Vπ(s5) = 100
Vπ(s2) = 1 + (0.8 Vπ(s3) + 0.2 Vπ(s5)) = 101
Vπ(s1) = 100 + Vπ(s2) = 201
Q(s1,move(r1,l1,l2)) = 100 + 101 = 201
Q(s1,move(r1,l1,l4)) = 1 + ½ × 201 + ½ × 0 = 101.5
argmin = move(r1,l1,l4)	
  
Q(s2,move(r1,l2,l3)) = 1 + (0.8 × 100 + 0.2 × 100) = 101
Q(s2,move(r1,l2,l1)) = 100 + 201 = 301
argmin = move(r1,l2,l3)	
  
Q(s3,move(r1,l3,l4)) = 100 + 0 = 100
Q(s3,move(r1,l3,l2)) = 100 + 101 = 201
argmin = move(r1,l3,l4)	
  
Q(s5,move(r1,l5,l4)) = 100 + 0 = 100
Q(s5,move(r1,l5,l4)) = 100+101 = 201
argmin = move(r1,l5,l4)
18	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
r	
  =	
  –100	
  
π = {(s1,	
  move(r1,l1,l4)),	
  
	
  	
  (s2,	
  move(r1,l2,l3)),	
  
	
  	
  (s3,	
  move(r1,l3,l4)),	
  
	
  	
  (s5,	
  move(r1,l5,l4))}
Example	
  
Goal	
  
Start	
  
Vπ(s4) = 0
Vπ(s3) = 100 + Vπ(s4) = 100
Vπ(s5) = 100 + Vπ(s5) = 100
Vπ(s2) = 1 + (0.8 Vπ(s3) + 0.2 Vπ(s5)) = 101
Vπ(s1) = 1 + ½ Vπ(s1) + ½ Vπ(s4) = 2
Q(s1,move(r1,l1,l2)) = 100 + 101 = 201
Q(s1,move(r1,l1,l4)) = 1 + ½ × 2 + ½ × 0 = 2
argmin = move(r1,l1,l4)	
  
Q(s2,move(r1,l2,l3)) = 1 + (0.8 × 100 + 0.2 × 100) = 101
Q(s2,move(r1,l2,l1)) = 100 + 2 = 102
argmin = move(r1,l2,l3)	
  
Q(s3,move(r1,l3,l4)) = 100 + 0 = 100
Q(s3,move(r1,l3,l2)) = 100 + 101 = 201
argmin = move(r1,l3,l4)	
  
Q(s5,move(r1,l5,l4)) = 100 + 0 = 100
Q(s5,move(r1,l5,l4)) = 100+101 = 201
argmin = move(r1,l5,l4)
19	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Value	
  Itera.on	
  (Synchronous	
  Version)	
  
●  Let (Σ,s0,Sg) be a safe SSP
●  Start with an arbitrary cost V(s) for each s and a small η > 0
VI(Σ,s0,Sg,V)	
  
π ← ∅
loop
Vold ← V
for every non-goal state s do
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′ ∈ S Pr (sʹ′ | s,a) Vold(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
if maxs ∈ S ∖ Sg
|V(s) – Vold(s)| < η for every s then exit the loop
π(s) ← argmina∈Applicable(s) Q(s,a)
●  |V′(s) – V(s)| is the residual of s
●  maxs ∈ S ∖ Sg
|V′(s) – V(s)| is the residual
20	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Goal	
  
Start	
  
Example	
  
●  aij = the action that moves from si to sj
Ø  e.g., a12 = move(r1,l1,l2))	
  
●  η = 0.2
●  V(s) = 0 for all s
Q(s1, a12) = 100 + 0 = 100
Q(s1, a14) = 1 + (½×0 + ½×0) = 1
min = 1
Q(s2, a21) = 100 + 0 = 100
Q(s2, a23) = 1 + (½×0 + ½×0) = 1
min = 1
Q(s3, a32) = 1 + 0 = 1
Q(s3, a34) = 100 + 0 = 100
min = 1
Q(s5, a52) = 1 + 0 = 1
Q(s5, a54) = 100 + 0 = 100
min = 1
residual = max(1–0, 1–0, 1–0, 1–0) = 1
21	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Goal	
  
Start	
  
Example	
  
●  V(s1) = 1; V(s2) = 1; V(s3) = 1; V(s4) = 0; V(s5) = 1
Q(s1, a12) = 100 + 1 = 101
Q(s1, a14) = 1 + (½×1 + ½×0) = 1½
min = 1½
Q(s2, a21) = 100 + 1 = 101
Q(s2, a23) = 1 + (½×1 + ½×1) = 2
min = 2
Q(s3, a32) = 1 + 1 = 2
Q(s3, a34) = 100 + 0 = 100
min = 2
Q(s5, a52) = 1 + 1 = 2
Q(s5, a54) = 100 + 0 = 100
min = 2
residual = max(1½–1, 2–1, 2–1, 2–1) = 1
22	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Goal	
  
Start	
  
Example	
  
●  V(s1) = 1½; V(s2) = 2; V(s3) = 2; V(s4) = 0; V(s5) = 2
Q(s1, a12) = 100 + 2 = 103
Q(s1, a14) = 1 + (½×1½ + ½×0) = 13/4
min = 13/4
Q(s2, a21) = 100 + 1½ = 101½
Q(s2, a23) = 1 + (½×2 + ½×2) = 3
min = 3
Q(s3, a32) = 1 + 2 = 3
Q(s3, a34) = 100 + 0 = 100
min = 3
Q(s5, a52) = 1 + 2 = 3
Q(s5, a54) = 100 + 0 = 100
min = 3
residual = max(13/4–1½, 3–2, 3–2, 3–2) = 1
23	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Goal	
  
Start	
  
Example	
  
●  V(s1) = 13/4; V(s2) = 3; V(s3) = 3; V(s4) = 0; V(s5) = 3
Q(s1, a12) = 100 + 3 = 104
Q(s1, a14) = 1 + (½×13/4 + ½×0) = 17/8
min = 17/8
Q(s2, a21) = 100 + 13/4 = 1013/4
Q(s2, a23) = 1 + (½×3 + ½×3) = 4
min = 4
Q(s3, a32) = 1 + 3 = 4
Q(s3, a34) = 100 + 0 = 100
min = 4
Q(s5, a52) = 1 + 3 = 4
Q(s5, a54) = 100 + 0 = 100
min = 4
residual = max(17/8–13/4, 4–3, 4–3, 4–3) = 1
●  How long before residual < η = 0.2?
●  How long if the “vertical” actions cost
10 instead of 100?
24	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Discussion	
  
●  Policy iteration computes an entire policy in each iteration,
and computes values based on that policy
Ø  More work per iteration, because it needs to solve a set of simultaneous
equations
Ø  Usually converges in a smaller number of iterations
●  Value iteration computes new values in each iteration,
and chooses a policy based on those values
Ø  In general, the values are not the values that one would get from the chosen
policy or any other policy
Ø  Less work per iteration, because it doesn’t need to solve a set of equations
Ø  Usually takes more iterations to converge
●  What I showed you was the synchronous version of Value Iteration
•  For each s, compute new values of Q and V using Vold
Ø  Asynchronous version: compute new values of Q and V using V
•  New values may depend on which nodes have already been updated
25	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Value	
  Itera.on	
  
●  Synchronous version:
VI(Σ,s0,Sg,V)	
  
π ← ∅
loop
Vold ← V
for every s ∈ S ∖ Sg do
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) +
∑sʹ′ ∈ S Pr (sʹ′|s,a) Vold(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
if maxs ∈ S ∖ Sg
|V(s) – Vold(s)| < η then
return π
●  maxs ∈ S ∖ Sg
|V(s) – Vold(s)| is the residual
●  |V(s) – Vold(s)| is the residual of s
●  Asynchronous version:	
  
	
  
VI(Σ,s0,Sg,V)	
  
π ← ∅
loop
r ← 0 // the residual
for every s ∈ S ∖ Sg do
r ← max(r,Bellman-­‐Update(s,V,π))
if r < η then return π
Bellman-­‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
Start with an arbitrary cost V(s) for each s, and a small η > 0
26	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Discussion	
  (Con.nued)	
  
●  For both, the number of iterations is polynomial in the number of states
Ø  But the number of states is usually quite large
Ø  In each iteration, need to examine the entire state space
●  Thus, these algorithms can take huge amounts of time and space
●  Use search techniques to avoid searching the entire space
27	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
AO∗ (Σ,s0,Sg,h)
π ← ∅; V(s0) ← h(s0)
Envelope ← {s0} // all generated states
loop
if leaves(s0,π) ⊆ Sg then return π
select s ∈ leaves(s0,π) ∖ Sg
for all a ∈ Applicable(s)
for all s′ ∈ γ(s,a) ∖ Envelope do
V(s′) ← h(s′); add s′ to Envelope
AO-­‐Update(s,V,π)
return π
AO-­‐Update(s,V,π)
Z ← {s} // set of nodes that need updating
while Z ≠ ∅ do
select any s ∈ Z such that γ(s,π(s)) ∩ Z = ∅
remove s from Z
Bellman-­‐Update(s,V,π)
Z ← Z ∪ {s′ ∈ Sπ | s ∈ γ(s′,π(s′))}
●  h is the heuristic function
Ø  Must have h(s) = 0 for every s in Sg
	
  
Bellman-­‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
Ø  Example: h(s) = 0 for all s
AO*	
  (requires	
  Σ	
  to	
  be	
  acyclic)	
  
Goal	
  
Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  20	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.5	
  
0.5	
  s1	
  
s6	
  
Dom(π)	
  
28	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Goal	
  
Start	
  
Bellman-­‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
LAO*	
  (can	
  handle	
  cycles)	
  
LAO∗(Σ,s0,Sg,h)
π ← ∅; V(s0) ← h(s0)
Envelope ← {s0} // all generated states
loop
if leaves(s0,π) ⊆ Sg then return π
select s ∈ leaves(s0,π) ∖ Sg
for all a ∈ Applicable(s)
for all s′ ∈ γ(s,a) ∖ Envelope do
V(s′) ← h(s′); add s′ to Envelope
LAO-­‐Update(s,V,π)
return π
LAO-­‐Update(s,V,π)
Z ← {s} ∪ {s′ ∈ γ(s0,π)} | s ∈ γ(s′,π)}
for every s ∈ Z do Bellman-­‐Update(s,V,π)
leavesold ← leaves(s0,π)
rmax ← η + 1
loop until leaves(s0,π) ⊈ leavesold or rmax ≤ η
rmax ← max{Bellman-­‐Update(s,V,π) | s ∈ Sπ)
all ancestors of s that we
can reach from s0 using π	
  
Dom(π)	
  
29	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Planning	
  and	
  Ac.ng	
  
●  Run-­‐Lookahead(Σ,s0,Sg)
Ø  s ← s0
Ø  while s ∉ Sg and Applicable(s) ≠ ∅ do
•  a ←Lookahead(s,θ)
•  perform action a
•  s ← observe resulting state
●  One possibility: use FF-­‐Replan	
  from Chapter 5
●  Problem: FF-­‐Replan doesn’t know about probabilities of outcomes
Ø  May choose actions that are likely to produce bad outcomes
Ø  e.g., a14 in the example above
Section 5.6 199
FF-Replan (⌃, s, Sg)
while s /2 Sg and Applicable(s) 6= ? do
if ⇡d undefined for s then do
⇡d Forward-search (⌃d, s, Sg)
apply action ⇡d(s)
s observe resulting state
Figure 5.22: Online determinization planning and acting algorithm.
Goal	
  
Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.9	
  
0.1	
  s1	
  
s6	
  
c	
  =	
  	
  1000	
  
30	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Improving	
  on	
  FF-­‐Replan	
  
●  RFF algorithm:
Ø  Don’t just generate one outcome
Ø  Generate all “likely” outcomes
and plan for them too
•  Pr(s | s0, π) ≥ θ
Section 5.6 199
FF-Replan (⌃, s, Sg)
while s /2 Sg and Applicable(s) 6= ? do
if ⇡d undefined for s then do
⇡d Forward-search (⌃d, s, Sg)
apply action ⇡d(s)
s observe resulting state
Figure 5.22: Online determinization planning and acting algorithm.
lookahead and partial numebr of outcomes, in any arbitrary way.
The second parametric dimension is in the application of the partial plan
that has been generated, i.e., apply the partial plan ⇡. Independently of the
lookahead, we can still execute ⇡ in a partial way. Suppose for instance that
we have generated a sequential plan of length n, we can decide to apply
m  n steps.
Two approaches to the design of a Lookahead procedure are presented
next:
• Lookahead by determinization
Det-­‐Plan	
  should be something like this: 	
  
If θ ≤ 0.9 then RFF will notice the problem
a subset of Dom(π)	
  
Goal	
  
Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.9	
  
0.1	
  s1	
  
s6	
  
c	
  =	
  	
  1000	
  
31	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Mul.-­‐Arm	
  Bandit	
  
●  Statistical model of sequential experiments
Ø  Name comes from a traditional slot machine
(one-armed bandit)
●  Multiple actions
Ø  Each action provides a reward from a
probability distribution associated with
that specific action
Ø  Objective: maximize the expected utility of a sequence of actions
●  Exploitation vs exploration dilemma:
Ø  Exploitation: choosing an action that you already know about, because
you think it’s likely to give you a high reward
Ø  Exploration: choosing an action that you don’t know much about, in
hopes that maybe it will produce a better reward than the actions you
already know about
32	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
UCB	
  (Upper	
  Confidence	
  Bound)	
  Algorithm	
  
●  Let
Ø  xi = average reward you’ve gotten from arm i
Ø  ti = number of times you’ve tried arm i;
Ø  t = ∑i ti
●  loop
Ø  if there are one or more arms
that have not been played
Ø  then play one of them
Ø  else play the arm i that has the highest value of xi + 2 √ (log t)/ti
33	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
UCT	
  Algorithm	
  
●  UCT (with a few corrections)
●  Recursive UCB computation
to compute Q(s,a)
●  Anytime algorithm
Ø  Call repeatedly until
time runs out
●  At end, choose action
argmina Q(s,a)
Goal	
  Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  20	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.5	
  
0.5	
  s1	
  
s6	
  
34	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
Kluge	
  for	
  use	
  in	
  unsafe	
  domains	
  
●  Modification for domains in
which some states are unsafe
Ø  Avoid unsafe plans by
refusing to choose
actions that lead to dead
ends
●  Problem: it’s too cautious
Ø  Will return ∞ if there are
no safe plans
if Applicable(s) = ∅ then
return ∞
Goal	
  Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.5	
  
0.5	
  s1	
  
s6	
  
35	
  Dana	
  Nau	
  and	
  Vikas	
  Shivashankar:	
  Lecture	
  slides	
  for	
  Automated	
  Planning	
  and	
  Ac0ng	
   Updated	
  5/10/15	
  
UCT	
  as	
  an	
  Ac.ng	
  Procedure	
  
●  Suppose that
Ø  You don’t know Pr
Ø  You can restart your actor
as many times as you want
●  Can modify UCT to be an acting
procedure
Ø  Use it to explore the
environment
Goal	
  Start	
  
s2	
  
s4	
  
s3	
  
s4	
  
c	
  =	
  1	
  
c	
  =	
  	
  100	
  
c	
  =	
  10	
  
c	
  =	
  20	
  
c	
  =	
  1	
  
0.2	
  
0.8	
  
0.5	
  
0.5	
  s1	
  
s6	
  
execute a; observe s′

More Related Content

What's hot

Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Yandex
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisCharaf Ech-Chatbi
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisCharaf Ech-Chatbi
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning郁凱 黃
 
Test s velocity_15_5_4
Test s velocity_15_5_4Test s velocity_15_5_4
Test s velocity_15_5_4Kunihiko Saito
 
Some contributions to PCA for time series
Some contributions to PCA for time seriesSome contributions to PCA for time series
Some contributions to PCA for time seriesIsabel Magalhães
 
ICPC Asia::Tokyo 2014 Problem J – Exhibition
ICPC Asia::Tokyo 2014 Problem J – ExhibitionICPC Asia::Tokyo 2014 Problem J – Exhibition
ICPC Asia::Tokyo 2014 Problem J – Exhibitionirrrrr
 
ICPC 2015, Tsukuba : Unofficial Commentary
ICPC 2015, Tsukuba: Unofficial CommentaryICPC 2015, Tsukuba: Unofficial Commentary
ICPC 2015, Tsukuba : Unofficial Commentaryirrrrr
 
Sep logic slide
Sep logic slideSep logic slide
Sep logic sliderainoftime
 
Programming workshop
Programming workshopProgramming workshop
Programming workshopSandeep Joshi
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral CompilationAkihiro Hayashi
 
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...Rene Kotze
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural NetworkRuochun Tzeng
 
Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Abhimanyu Mishra
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
 

What's hot (20)

Chapter03b
Chapter03bChapter03b
Chapter03b
 
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann Hypothesis
 
A Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann HypothesisA Proof of the Generalized Riemann Hypothesis
A Proof of the Generalized Riemann Hypothesis
 
Link analysis
Link analysisLink analysis
Link analysis
 
Pda
PdaPda
Pda
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
Test s velocity_15_5_4
Test s velocity_15_5_4Test s velocity_15_5_4
Test s velocity_15_5_4
 
Some contributions to PCA for time series
Some contributions to PCA for time seriesSome contributions to PCA for time series
Some contributions to PCA for time series
 
ICPC Asia::Tokyo 2014 Problem J – Exhibition
ICPC Asia::Tokyo 2014 Problem J – ExhibitionICPC Asia::Tokyo 2014 Problem J – Exhibition
ICPC Asia::Tokyo 2014 Problem J – Exhibition
 
ICPC 2015, Tsukuba : Unofficial Commentary
ICPC 2015, Tsukuba: Unofficial CommentaryICPC 2015, Tsukuba: Unofficial Commentary
ICPC 2015, Tsukuba : Unofficial Commentary
 
qlp
qlpqlp
qlp
 
Test (S) on R
Test (S) on RTest (S) on R
Test (S) on R
 
Sep logic slide
Sep logic slideSep logic slide
Sep logic slide
 
Programming workshop
Programming workshopProgramming workshop
Programming workshop
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral Compilation
 
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural Network
 
Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficients
 

Viewers also liked

EDU 305 Entire Course 2015 version
EDU 305 Entire Course 2015 versionEDU 305 Entire Course 2015 version
EDU 305 Entire Course 2015 versionarvinmehlschau
 
São Conrado, Residencial Golf Village Estrada do Joá, São Conrado
São Conrado, Residencial Golf Village Estrada do Joá, São ConradoSão Conrado, Residencial Golf Village Estrada do Joá, São Conrado
São Conrado, Residencial Golf Village Estrada do Joá, São ConradoLancamentosrj
 
Proyecto personal
Proyecto personalProyecto personal
Proyecto personalkarla1517
 
Will You Take The Challenge?
Will You Take The Challenge?Will You Take The Challenge?
Will You Take The Challenge?David Allen
 
Slic-tite PTFE Paste from Project Sales Corp
Slic-tite PTFE Paste from Project Sales CorpSlic-tite PTFE Paste from Project Sales Corp
Slic-tite PTFE Paste from Project Sales CorpProject Sales Corp
 
presentacion
presentacionpresentacion
presentacionFlor Hana
 
Accessing aplia dre 098 a likin
Accessing aplia dre 098 a likinAccessing aplia dre 098 a likin
Accessing aplia dre 098 a likinaclikin
 
Guías Clínicas para la Evaluación y Tratamiento del Trastorno Bipolar
Guías Clínicas para la Evaluación y Tratamiento del Trastorno BipolarGuías Clínicas para la Evaluación y Tratamiento del Trastorno Bipolar
Guías Clínicas para la Evaluación y Tratamiento del Trastorno BipolarDianellys Juarbe
 
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...Dianellys Juarbe
 
Cdma system
Cdma systemCdma system
Cdma systemtrimba
 
Presentation on International Marketing (samsung company)
Presentation on International Marketing (samsung company)Presentation on International Marketing (samsung company)
Presentation on International Marketing (samsung company)Md. Sourav Hossain
 
PowerPoint y las Presentaciones Electrónicas
PowerPoint y las Presentaciones ElectrónicasPowerPoint y las Presentaciones Electrónicas
PowerPoint y las Presentaciones ElectrónicasDianellys Juarbe
 

Viewers also liked (15)

ALMOBTY
ALMOBTYALMOBTY
ALMOBTY
 
EDU 305 Entire Course 2015 version
EDU 305 Entire Course 2015 versionEDU 305 Entire Course 2015 version
EDU 305 Entire Course 2015 version
 
Blogs y la sala de clases
Blogs y la sala de clasesBlogs y la sala de clases
Blogs y la sala de clases
 
São Conrado, Residencial Golf Village Estrada do Joá, São Conrado
São Conrado, Residencial Golf Village Estrada do Joá, São ConradoSão Conrado, Residencial Golf Village Estrada do Joá, São Conrado
São Conrado, Residencial Golf Village Estrada do Joá, São Conrado
 
Proyecto personal
Proyecto personalProyecto personal
Proyecto personal
 
Red State
Red StateRed State
Red State
 
Will You Take The Challenge?
Will You Take The Challenge?Will You Take The Challenge?
Will You Take The Challenge?
 
Slic-tite PTFE Paste from Project Sales Corp
Slic-tite PTFE Paste from Project Sales CorpSlic-tite PTFE Paste from Project Sales Corp
Slic-tite PTFE Paste from Project Sales Corp
 
presentacion
presentacionpresentacion
presentacion
 
Accessing aplia dre 098 a likin
Accessing aplia dre 098 a likinAccessing aplia dre 098 a likin
Accessing aplia dre 098 a likin
 
Guías Clínicas para la Evaluación y Tratamiento del Trastorno Bipolar
Guías Clínicas para la Evaluación y Tratamiento del Trastorno BipolarGuías Clínicas para la Evaluación y Tratamiento del Trastorno Bipolar
Guías Clínicas para la Evaluación y Tratamiento del Trastorno Bipolar
 
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...
Elementos básicos de las funciones del cuerpo y cambios que deben ser reporta...
 
Cdma system
Cdma systemCdma system
Cdma system
 
Presentation on International Marketing (samsung company)
Presentation on International Marketing (samsung company)Presentation on International Marketing (samsung company)
Presentation on International Marketing (samsung company)
 
PowerPoint y las Presentaciones Electrónicas
PowerPoint y las Presentaciones ElectrónicasPowerPoint y las Presentaciones Electrónicas
PowerPoint y las Presentaciones Electrónicas
 

Similar to Automated Planning with Probabilistic Models

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yandex
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automataeugenesri
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automataparmeet834
 
PushdownAutomata.ppt
PushdownAutomata.pptPushdownAutomata.ppt
PushdownAutomata.pptRSRS39
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxjyotidighole2
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Traian Rebedea
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restrictionVjekoslavKovac1
 
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...jfrchicanog
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdfssuser8b461f
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient TheoremAshwin Rao
 
On Steiner Dominating Sets and Steiner Domination Polynomials of Paths
On Steiner Dominating Sets and Steiner Domination Polynomials of PathsOn Steiner Dominating Sets and Steiner Domination Polynomials of Paths
On Steiner Dominating Sets and Steiner Domination Polynomials of PathsIJERA Editor
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Alexander Litvinenko
 

Similar to Automated Planning with Probabilistic Models (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares Method
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Prova global 2 correção
Prova global 2 correçãoProva global 2 correção
Prova global 2 correção
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automata
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automata
 
PushdownAutomata.ppt
PushdownAutomata.pptPushdownAutomata.ppt
PushdownAutomata.ppt
 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptx
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdf
 
Nbvtalkatbzaonencryptionpuzzles
NbvtalkatbzaonencryptionpuzzlesNbvtalkatbzaonencryptionpuzzles
Nbvtalkatbzaonencryptionpuzzles
 
Nbvtalkatbzaonencryptionpuzzles
NbvtalkatbzaonencryptionpuzzlesNbvtalkatbzaonencryptionpuzzles
Nbvtalkatbzaonencryptionpuzzles
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient Theorem
 
On Steiner Dominating Sets and Steiner Domination Polynomials of Paths
On Steiner Dominating Sets and Steiner Domination Polynomials of PathsOn Steiner Dominating Sets and Steiner Domination Polynomials of Paths
On Steiner Dominating Sets and Steiner Domination Polynomials of Paths
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 

More from Tianlu Wang

14 pro resolution
14 pro resolution14 pro resolution
14 pro resolutionTianlu Wang
 
13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculusTianlu Wang
 
12 adversal search
12 adversal search12 adversal search
12 adversal searchTianlu Wang
 
11 alternative search
11 alternative search11 alternative search
11 alternative searchTianlu Wang
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculusTianlu Wang
 
20 bayes learning
20 bayes learning20 bayes learning
20 bayes learningTianlu Wang
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidenceTianlu Wang
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledgeTianlu Wang
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systemsTianlu Wang
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based systemTianlu Wang
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolutionTianlu Wang
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolutionTianlu Wang
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic searchTianlu Wang
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed searchTianlu Wang
 

More from Tianlu Wang (20)

L7 er2
L7 er2L7 er2
L7 er2
 
L8 design1
L8 design1L8 design1
L8 design1
 
L9 design2
L9 design2L9 design2
L9 design2
 
14 pro resolution
14 pro resolution14 pro resolution
14 pro resolution
 
13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculus
 
12 adversal search
12 adversal search12 adversal search
12 adversal search
 
11 alternative search
11 alternative search11 alternative search
11 alternative search
 
10 2 sum
10 2 sum10 2 sum
10 2 sum
 
22 planning
22 planning22 planning
22 planning
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculus
 
20 bayes learning
20 bayes learning20 bayes learning
20 bayes learning
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidence
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledge
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systems
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based system
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolution
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolution
 
15 predicate
15 predicate15 predicate
15 predicate
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed search
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Automated Planning with Probabilistic Models

  • 1. 1  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   This  work  is  licensed  under  a  CreaAve  Commons  ADribuAon-­‐NonCommercial-­‐ShareAlike  4.0  InternaAonal  License.   Chapter  6     Delibera.on  with  Probabilis.c  Domain  Models   Dana S. Nau and Vikas Shivashankar University of Maryland
  • 2. 2  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Probabilis.c  Planning  Domain   ●  Actions have multiple possible outcomes Ø  Each outcome has a probability ●  Several possible action representations Ø  Bayes nets, probabilistic operators, … ●  Book doesn’t commit to any representation Ø  Only deals with the underlying semantics ●  Σ = (S,A,γ,Pr,cost) Ø  S = set of states Ø  A = set of actions Ø  γ : S × A → 2S Ø  Pr(s′ | s, a) = probability of going to state s′ if we perform a in s •  Require Pr(s′ | s, a) > 0 for every s′ in γ(s,a) Ø  cost: S × A → + •  cost(s,a) = cost of action a in state s
  • 3. 3  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Nota.on  from  Chapter  5   ●  Policy π : Sπ → A Ø  Sπ ⊆ S Ø  ∀s ∈ Sπ , π(s) ∈ Applicable(s) ●  γ︎(s,π) = {s and all descendants of s reachable by π} Ø  the transitive closure of γ with π ●  Graph(s,π) = rooted graph induced by π Ø  {nodes} = γ︎(s,π) Ø  {edges} = ∪a∈Applicable(s){(s,s′) | s′ ∈ γ(s,a)} Ø  root = s ●  leaves(s,π) = {states in γ︎(s,π) that aren’t in Sπ}
  • 4. 4  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Stochas.c  Systems   ●  Stochastic shortest path (SSP) problem: a triple (Σ, s0, Sg) ●  Solution for (Σ, s0, Sg): Ø  policy π such that s0 ∈ Sπ and leaves(s0,π) ⋂ Sg ≠ ∅ ●  π is closed if π doesn’t stop at non-goal states unless no action is applicable Ø  for every state in γ︎(s,π), either •  s ∈ Sπ (i.e., π specifies an action at s) •  s ∈ Sg (i.e., s is a goal state) •  Applicable(s) = ∅ (i.e., there are no applicable actions at s)
  • 5. 5  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   ●  Robot r1 starts at location l1   Ø  s0 = s1 in the diagram ●  Objective is to get r1 to location l4   Ø  Sg = {s4} ●  π1 = {(s1,  move(r1,l1,l2)),  (s2,  move(r1,l2,l3)),  (s3,  move(r1,l3,l4))} Ø  Solution? Ø  Closed?   move(r1,l2,l1)   2   Policies   Goal   Start  
  • 6. 6  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   ●  History: a sequence of states, starting at s0 σ = 〈s0, s1, s2, s3, …, sh〉 or (not in book): σ = 〈s0, s1, s2, s3, …〉 ●  Let H(π) = {all histories that can be produced by following π from s0 to a state in leaves(s0, π)} ●  If σ ∈ H(π) then Pr (σ | π) = ∏i ≥ 0 Pr (si+1 | si ,π(si)) Ø  Thus ∑σ ∈ Hπ Pr (σ | π) = 1 ●  Probability that π will stop at a goal state: Ø  Pr (Sg | π) = ∑ {Pr (σ | π) | σ ∈ H(π) and σ ends at a state in Sg} move(r1,l2,l1)   2   Histories   Goal   Start  
  • 7. 7  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Unsafe  Solu.ons   ●  A solution π is unsafe if Ø  0 < Pr (Sg | π) < 1 ●  Example: π1 = {(s1,  move(r1,l1,l2)),            (s2,  move(r1,l2,l3)),            (s3,  move(r1,l3,l4))} ●  H(π1) contains two histories: Ø  σ1 = 〈s1,  s2,  s3,  s4〉  Pr (σ1 | π1) = 1 × .8 × 1 = 0.8 Ø  σ2 = 〈s1,  s2,  s5〉  Pr (σ2 | π1) = 1 × .2 = 0.2 ●  Pr (Sg | π) = 0.8 Explicit  dead  end   move(r1,l2,l1)   2   Goal   Start  
  • 8. 8  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Unsafe  Solu.ons   ●  A solution π is unsafe if Ø  0 < Pr (Sg | π) < 1 ●  Example: π2 = {(s1,  move(r1,l1,l2)),            (s2,  move(r1,l2,l3)),            (s3,  move(r1,l3,l4)),            (s5,  move(r1,l5,l6)),            (s6,  move(r1,l6,l5))} ●  H(π2) contains two histories: Ø  σ1 = 〈s1,  s2,  s3,  s4〉  Pr (σ1 | π2) = 1 × .8 × 1 = 0.8 Ø  σ3 = 〈s1,  s2,  s5,  s6,  s5,  s6,  …  〉  Pr (σ3 | π2) = 1 × .2 × 1 × 1 × 1 × … = 0.2 ●  Pr (Sg | π2) = 0.8 Implicit  dead  end   move(r1,l2,l1)   2   wait   s6   at(r1,l6)   Goal   Start  
  • 9. 9  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   ●  A solution π is safe if Ø  Pr (Sg | π) = 1 ●  An acyclic safe solution: π3 = {(s1,  move(r1,l1,l2)),              (s2,  move(r1,l2,l3)),              (s3,  move(r1,l3,l4)),              (s5,  move(r1,l5,l4))} ●  H(π3) contains two histories: Ø  σ1 = 〈s1,  s2,  s3,  s4〉  Pr (σ1 | π3) = 1 × .8 × 1 = 0.8 Ø  σ4 = 〈s1,  s2,  s5,  s4〉  Pr (σ4 | π3) = 1 × .2 × 1 = 0.2 ●  Pr (Sg | π3) = 0.8 + 0.2 = 1 move(r1,l2,l1)   2   Safe  Solu.ons   Goal   Start  
  • 10. 10  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   ●  A solution π is safe if Ø  Pr (Sg | π) = 1 ●  A cyclic safe solution: π4 = {(s5,  move(r1,l5,l4)} ●  H(π4) contains infinitely many histories:   Ø  σ5 = 〈s1,  s4  〉  Pr (σ5 | π4) = 0.5 Ø  σ6 = 〈s1,  s1,  s4〉  Pr (σ6 | π4) = 0.5 × 0.5 = 0.25 Ø  σ7 = 〈s1,  s1,  s1,  s4〉  Pr (σ7 | π4) = 0.5 × 0.5 × 0.5 = 0.125 • • • Ø  σ∞ = 〈s1,  s1,  s1,  s1,  …  〉  Pr (σ∞ | π4) = 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × … = 0 move(r1,l2,l1)   2   Safe  Solu.ons   Pr (Sg | π4) = .5 + .25 + .125 + … = 1 Goal   Start  
  • 11. 11  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   ●  Example: π = {(s1,  move(r1,l1,l2)),          (s2,  move(r1,l2,l3)),          (s3,  move(r1,l3,l4)),          (s4,  move(r1,l4,l1)),          (s5,  move(r1,l5,l1))} ●  What is Pr (Sg | π)? Goal   move(r1,l2,l1)   2   Example   Start  
  • 12. 12  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   r  =  –100   Expected  Cost   Goal   Start   ●  cost(s,a) = cost of using a in s ●  Example: Ø  cost(s,a) = 1 for each “horizontal” action Ø  cost(s,a) = 100 for each “vertical” action ●  Cost of a history: Ø  Let σ = 〈s0, s1, … 〉 ∈ H(π) Ø  cost(σ | π) = ∑i ≥ 0 cost(si,π(si)) ●  Let π be a safe solution ●  Expected cost of following π to a goal: Ø  Vπ(s) = 0 if s is a goal Ø  Vπ(s) = cost(s,π(s)) + ∑ ︎s′∈γ(s,π(s)) Pr (sʹ′|s, π(s)) Vπ(sʹ′) otherwise ●  If s = s0 then Ø  Vπ(s0) = ∑σ ∈ H(π) cost(σ | π) Pr(σ | s0, π) s s1 s2 sn … π(s)
  • 13. 13  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   r  =  –100   Example   Goal   Start   π3 = {(s1,  move(r1,l1,l2)),          (s2,  move(r1,l2,l3)),          (s3,  move(r1,l3,l4)),          (s5,  move(r1,l5,l4))}   H(π1) contains two histories: σ1 = 〈s1,  s2,  s3,  s4〉  Pr (σ1 | π3) = 0.8 cost(σ1 | π3) = 100 + 1 + 100 = 201   σ2 = 〈s1,  s2,  s5,  s4〉  Pr (σ2 | π3) = 0.2 cost(σ2 | π3) = 100 + 1 + 100 = 201     Vπ1(s1) = 201×0.8 + 201×0.2 = 201
  • 14. 14  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   π4 = {(s5,  move(r1,l5,l4)} ●  H(π4) contains infinitely many histories:   Ø  σ5 = 〈s1,  s4  〉  Pr (σ5 | π4) = 0.5 cost (σ5 | π4) = 1 Ø  σ6 = 〈s1,  s1,  s4〉  Pr (σ6 | π4) = 0.25 cost (σ6 | π4) = 2 Ø  σ7 = 〈s1,  s1,  s1,  s4〉  Pr (σ7 | π4) = 0.125 cost (σ7 | π4) = 3 • • • ●  Vπ4(s1) = 1×.5 + 2×.25 + 3×.125 + 4×.0625 + … = 2 Safe  Solu.ons   Goal   Start  
  • 15. 15  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Planning  as  Op.miza.on   ●  Let π and π′ be safe solutions ●  π dominates π′ if Vπ(s) ≤ Vπ′(s) for every state where both π and π′ are defined Ø  i.e., Vπ(s) ≤ Vπ′(s) for every s in Sπ ∩ Sπ′ ●  π is optimal if π dominates every safe solution π′ ●  V*(s) = min{Vπ(s) | π is a safe solution for which π(s) is defined} = expected cost of getting from s to a goal using an optimal safe solution ●  Optimality principle (also called Bellman’s theorem): Ø  V*(s) = 0, if s is a goal Ø  V*(s) = mina∈Applicable(s){cost(s,a) + ∑ ︎s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′)}, otherwise s s1 s2 sn … π(s)
  • 16. 16  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Policy  Itera.on   ●  Let (Σ,s0,Sg) be a safe SSP (i.e., Sg is reachable from every state) ●  Let π be a safe solution that is defined at every state in S ●  Let s be a state, and let a ∈ Applicable(s) Ø  Cost-to-go: expected cost at s if we start with a, and use π afterward Ø  Qπ(s,a) = cost(s,a) + ∑s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′) ●  For every s, let π′(s) ∈ argmina∈Applicable(s) Qπ(s,a) Ø  Then π′ is a safe solution and dominates π ●  PI(Σ,s0,Sg,π0)          π ← π0 loop compute Vπ (n equations and n unkowns, where n = |S|) for every non-goal state s do π′(s) ← any action in argmina∈Applicable(s) Qπ(s,a) if π′ = π then return π π ← π′ ●  Converges in a finite number of iterations s s1 s2 sn … π(s) Tie-breaking rule: if π(s) ∈ argmina∈Applicable(s) Qπ(s,a), then use π′(s) = π(s)
  • 17. 17  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   r  =  –100   Start with π0 = {(s1,  move(r1,l1,l2)),      (s2,  move(r1,l2,l3)),      (s3,  move(r1,l3,l4)),      (s5,  move(r1,l5,l4))} Example   Goal   Start   Vπ(s4) = 0 Vπ(s3) = 100 + Vπ(s4) = 100 Vπ(s5) = 100 + Vπ(s5) = 100 Vπ(s2) = 1 + (0.8 Vπ(s3) + 0.2 Vπ(s5)) = 101 Vπ(s1) = 100 + Vπ(s2) = 201 Q(s1,move(r1,l1,l2)) = 100 + 101 = 201 Q(s1,move(r1,l1,l4)) = 1 + ½ × 201 + ½ × 0 = 101.5 argmin = move(r1,l1,l4)   Q(s2,move(r1,l2,l3)) = 1 + (0.8 × 100 + 0.2 × 100) = 101 Q(s2,move(r1,l2,l1)) = 100 + 201 = 301 argmin = move(r1,l2,l3)   Q(s3,move(r1,l3,l4)) = 100 + 0 = 100 Q(s3,move(r1,l3,l2)) = 100 + 101 = 201 argmin = move(r1,l3,l4)   Q(s5,move(r1,l5,l4)) = 100 + 0 = 100 Q(s5,move(r1,l5,l4)) = 100+101 = 201 argmin = move(r1,l5,l4)
  • 18. 18  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   r  =  –100   π = {(s1,  move(r1,l1,l4)),      (s2,  move(r1,l2,l3)),      (s3,  move(r1,l3,l4)),      (s5,  move(r1,l5,l4))} Example   Goal   Start   Vπ(s4) = 0 Vπ(s3) = 100 + Vπ(s4) = 100 Vπ(s5) = 100 + Vπ(s5) = 100 Vπ(s2) = 1 + (0.8 Vπ(s3) + 0.2 Vπ(s5)) = 101 Vπ(s1) = 1 + ½ Vπ(s1) + ½ Vπ(s4) = 2 Q(s1,move(r1,l1,l2)) = 100 + 101 = 201 Q(s1,move(r1,l1,l4)) = 1 + ½ × 2 + ½ × 0 = 2 argmin = move(r1,l1,l4)   Q(s2,move(r1,l2,l3)) = 1 + (0.8 × 100 + 0.2 × 100) = 101 Q(s2,move(r1,l2,l1)) = 100 + 2 = 102 argmin = move(r1,l2,l3)   Q(s3,move(r1,l3,l4)) = 100 + 0 = 100 Q(s3,move(r1,l3,l2)) = 100 + 101 = 201 argmin = move(r1,l3,l4)   Q(s5,move(r1,l5,l4)) = 100 + 0 = 100 Q(s5,move(r1,l5,l4)) = 100+101 = 201 argmin = move(r1,l5,l4)
  • 19. 19  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Value  Itera.on  (Synchronous  Version)   ●  Let (Σ,s0,Sg) be a safe SSP ●  Start with an arbitrary cost V(s) for each s and a small η > 0 VI(Σ,s0,Sg,V)   π ← ∅ loop Vold ← V for every non-goal state s do for every a ∈ Applicable(s) do Q(s,a) ← cost(s,a) + ∑sʹ′ ∈ S Pr (sʹ′ | s,a) Vold(sʹ′) V(s) ← mina∈Applicable(s) Q(s,a) if maxs ∈ S ∖ Sg |V(s) – Vold(s)| < η for every s then exit the loop π(s) ← argmina∈Applicable(s) Q(s,a) ●  |V′(s) – V(s)| is the residual of s ●  maxs ∈ S ∖ Sg |V′(s) – V(s)| is the residual
  • 20. 20  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Goal   Start   Example   ●  aij = the action that moves from si to sj Ø  e.g., a12 = move(r1,l1,l2))   ●  η = 0.2 ●  V(s) = 0 for all s Q(s1, a12) = 100 + 0 = 100 Q(s1, a14) = 1 + (½×0 + ½×0) = 1 min = 1 Q(s2, a21) = 100 + 0 = 100 Q(s2, a23) = 1 + (½×0 + ½×0) = 1 min = 1 Q(s3, a32) = 1 + 0 = 1 Q(s3, a34) = 100 + 0 = 100 min = 1 Q(s5, a52) = 1 + 0 = 1 Q(s5, a54) = 100 + 0 = 100 min = 1 residual = max(1–0, 1–0, 1–0, 1–0) = 1
  • 21. 21  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Goal   Start   Example   ●  V(s1) = 1; V(s2) = 1; V(s3) = 1; V(s4) = 0; V(s5) = 1 Q(s1, a12) = 100 + 1 = 101 Q(s1, a14) = 1 + (½×1 + ½×0) = 1½ min = 1½ Q(s2, a21) = 100 + 1 = 101 Q(s2, a23) = 1 + (½×1 + ½×1) = 2 min = 2 Q(s3, a32) = 1 + 1 = 2 Q(s3, a34) = 100 + 0 = 100 min = 2 Q(s5, a52) = 1 + 1 = 2 Q(s5, a54) = 100 + 0 = 100 min = 2 residual = max(1½–1, 2–1, 2–1, 2–1) = 1
  • 22. 22  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Goal   Start   Example   ●  V(s1) = 1½; V(s2) = 2; V(s3) = 2; V(s4) = 0; V(s5) = 2 Q(s1, a12) = 100 + 2 = 103 Q(s1, a14) = 1 + (½×1½ + ½×0) = 13/4 min = 13/4 Q(s2, a21) = 100 + 1½ = 101½ Q(s2, a23) = 1 + (½×2 + ½×2) = 3 min = 3 Q(s3, a32) = 1 + 2 = 3 Q(s3, a34) = 100 + 0 = 100 min = 3 Q(s5, a52) = 1 + 2 = 3 Q(s5, a54) = 100 + 0 = 100 min = 3 residual = max(13/4–1½, 3–2, 3–2, 3–2) = 1
  • 23. 23  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Goal   Start   Example   ●  V(s1) = 13/4; V(s2) = 3; V(s3) = 3; V(s4) = 0; V(s5) = 3 Q(s1, a12) = 100 + 3 = 104 Q(s1, a14) = 1 + (½×13/4 + ½×0) = 17/8 min = 17/8 Q(s2, a21) = 100 + 13/4 = 1013/4 Q(s2, a23) = 1 + (½×3 + ½×3) = 4 min = 4 Q(s3, a32) = 1 + 3 = 4 Q(s3, a34) = 100 + 0 = 100 min = 4 Q(s5, a52) = 1 + 3 = 4 Q(s5, a54) = 100 + 0 = 100 min = 4 residual = max(17/8–13/4, 4–3, 4–3, 4–3) = 1 ●  How long before residual < η = 0.2? ●  How long if the “vertical” actions cost 10 instead of 100?
  • 24. 24  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Discussion   ●  Policy iteration computes an entire policy in each iteration, and computes values based on that policy Ø  More work per iteration, because it needs to solve a set of simultaneous equations Ø  Usually converges in a smaller number of iterations ●  Value iteration computes new values in each iteration, and chooses a policy based on those values Ø  In general, the values are not the values that one would get from the chosen policy or any other policy Ø  Less work per iteration, because it doesn’t need to solve a set of equations Ø  Usually takes more iterations to converge ●  What I showed you was the synchronous version of Value Iteration •  For each s, compute new values of Q and V using Vold Ø  Asynchronous version: compute new values of Q and V using V •  New values may depend on which nodes have already been updated
  • 25. 25  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Value  Itera.on   ●  Synchronous version: VI(Σ,s0,Sg,V)   π ← ∅ loop Vold ← V for every s ∈ S ∖ Sg do for every a ∈ Applicable(s) do Q(s,a) ← cost(s,a) + ∑sʹ′ ∈ S Pr (sʹ′|s,a) Vold(sʹ′) V(s) ← mina∈Applicable(s) Q(s,a) π(s) ← argmina∈Applicable(s) Q(s,a) if maxs ∈ S ∖ Sg |V(s) – Vold(s)| < η then return π ●  maxs ∈ S ∖ Sg |V(s) – Vold(s)| is the residual ●  |V(s) – Vold(s)| is the residual of s ●  Asynchronous version:     VI(Σ,s0,Sg,V)   π ← ∅ loop r ← 0 // the residual for every s ∈ S ∖ Sg do r ← max(r,Bellman-­‐Update(s,V,π)) if r < η then return π Bellman-­‐Update(s,V,π) vold ← V(s) for every a ∈ Applicable(s) do Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′) V(s) ← mina∈Applicable(s) Q(s,a) π(s) ← argmina∈Applicable(s) Q(s,a) return |V(s) – vold| Start with an arbitrary cost V(s) for each s, and a small η > 0
  • 26. 26  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Discussion  (Con.nued)   ●  For both, the number of iterations is polynomial in the number of states Ø  But the number of states is usually quite large Ø  In each iteration, need to examine the entire state space ●  Thus, these algorithms can take huge amounts of time and space ●  Use search techniques to avoid searching the entire space
  • 27. 27  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   AO∗ (Σ,s0,Sg,h) π ← ∅; V(s0) ← h(s0) Envelope ← {s0} // all generated states loop if leaves(s0,π) ⊆ Sg then return π select s ∈ leaves(s0,π) ∖ Sg for all a ∈ Applicable(s) for all s′ ∈ γ(s,a) ∖ Envelope do V(s′) ← h(s′); add s′ to Envelope AO-­‐Update(s,V,π) return π AO-­‐Update(s,V,π) Z ← {s} // set of nodes that need updating while Z ≠ ∅ do select any s ∈ Z such that γ(s,π(s)) ∩ Z = ∅ remove s from Z Bellman-­‐Update(s,V,π) Z ← Z ∪ {s′ ∈ Sπ | s ∈ γ(s′,π(s′))} ●  h is the heuristic function Ø  Must have h(s) = 0 for every s in Sg   Bellman-­‐Update(s,V,π) vold ← V(s) for every a ∈ Applicable(s) do Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′) V(s) ← mina∈Applicable(s) Q(s,a) π(s) ← argmina∈Applicable(s) Q(s,a) return |V(s) – vold| Ø  Example: h(s) = 0 for all s AO*  (requires  Σ  to  be  acyclic)   Goal   Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  20   c  =  1   0.2   0.8   0.5   0.5  s1   s6   Dom(π)  
  • 28. 28  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Goal   Start   Bellman-­‐Update(s,V,π) vold ← V(s) for every a ∈ Applicable(s) do Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′) V(s) ← mina∈Applicable(s) Q(s,a) π(s) ← argmina∈Applicable(s) Q(s,a) return |V(s) – vold| LAO*  (can  handle  cycles)   LAO∗(Σ,s0,Sg,h) π ← ∅; V(s0) ← h(s0) Envelope ← {s0} // all generated states loop if leaves(s0,π) ⊆ Sg then return π select s ∈ leaves(s0,π) ∖ Sg for all a ∈ Applicable(s) for all s′ ∈ γ(s,a) ∖ Envelope do V(s′) ← h(s′); add s′ to Envelope LAO-­‐Update(s,V,π) return π LAO-­‐Update(s,V,π) Z ← {s} ∪ {s′ ∈ γ(s0,π)} | s ∈ γ(s′,π)} for every s ∈ Z do Bellman-­‐Update(s,V,π) leavesold ← leaves(s0,π) rmax ← η + 1 loop until leaves(s0,π) ⊈ leavesold or rmax ≤ η rmax ← max{Bellman-­‐Update(s,V,π) | s ∈ Sπ) all ancestors of s that we can reach from s0 using π   Dom(π)  
  • 29. 29  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Planning  and  Ac.ng   ●  Run-­‐Lookahead(Σ,s0,Sg) Ø  s ← s0 Ø  while s ∉ Sg and Applicable(s) ≠ ∅ do •  a ←Lookahead(s,θ) •  perform action a •  s ← observe resulting state ●  One possibility: use FF-­‐Replan  from Chapter 5 ●  Problem: FF-­‐Replan doesn’t know about probabilities of outcomes Ø  May choose actions that are likely to produce bad outcomes Ø  e.g., a14 in the example above Section 5.6 199 FF-Replan (⌃, s, Sg) while s /2 Sg and Applicable(s) 6= ? do if ⇡d undefined for s then do ⇡d Forward-search (⌃d, s, Sg) apply action ⇡d(s) s observe resulting state Figure 5.22: Online determinization planning and acting algorithm. Goal   Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  1   0.2   0.8   0.9   0.1  s1   s6   c  =    1000  
  • 30. 30  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Improving  on  FF-­‐Replan   ●  RFF algorithm: Ø  Don’t just generate one outcome Ø  Generate all “likely” outcomes and plan for them too •  Pr(s | s0, π) ≥ θ Section 5.6 199 FF-Replan (⌃, s, Sg) while s /2 Sg and Applicable(s) 6= ? do if ⇡d undefined for s then do ⇡d Forward-search (⌃d, s, Sg) apply action ⇡d(s) s observe resulting state Figure 5.22: Online determinization planning and acting algorithm. lookahead and partial numebr of outcomes, in any arbitrary way. The second parametric dimension is in the application of the partial plan that has been generated, i.e., apply the partial plan ⇡. Independently of the lookahead, we can still execute ⇡ in a partial way. Suppose for instance that we have generated a sequential plan of length n, we can decide to apply m  n steps. Two approaches to the design of a Lookahead procedure are presented next: • Lookahead by determinization Det-­‐Plan  should be something like this:   If θ ≤ 0.9 then RFF will notice the problem a subset of Dom(π)   Goal   Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  1   0.2   0.8   0.9   0.1  s1   s6   c  =    1000  
  • 31. 31  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Mul.-­‐Arm  Bandit   ●  Statistical model of sequential experiments Ø  Name comes from a traditional slot machine (one-armed bandit) ●  Multiple actions Ø  Each action provides a reward from a probability distribution associated with that specific action Ø  Objective: maximize the expected utility of a sequence of actions ●  Exploitation vs exploration dilemma: Ø  Exploitation: choosing an action that you already know about, because you think it’s likely to give you a high reward Ø  Exploration: choosing an action that you don’t know much about, in hopes that maybe it will produce a better reward than the actions you already know about
  • 32. 32  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   UCB  (Upper  Confidence  Bound)  Algorithm   ●  Let Ø  xi = average reward you’ve gotten from arm i Ø  ti = number of times you’ve tried arm i; Ø  t = ∑i ti ●  loop Ø  if there are one or more arms that have not been played Ø  then play one of them Ø  else play the arm i that has the highest value of xi + 2 √ (log t)/ti
  • 33. 33  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   UCT  Algorithm   ●  UCT (with a few corrections) ●  Recursive UCB computation to compute Q(s,a) ●  Anytime algorithm Ø  Call repeatedly until time runs out ●  At end, choose action argmina Q(s,a) Goal  Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  20   c  =  1   0.2   0.8   0.5   0.5  s1   s6  
  • 34. 34  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   Kluge  for  use  in  unsafe  domains   ●  Modification for domains in which some states are unsafe Ø  Avoid unsafe plans by refusing to choose actions that lead to dead ends ●  Problem: it’s too cautious Ø  Will return ∞ if there are no safe plans if Applicable(s) = ∅ then return ∞ Goal  Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  1   0.2   0.8   0.5   0.5  s1   s6  
  • 35. 35  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  5/10/15   UCT  as  an  Ac.ng  Procedure   ●  Suppose that Ø  You don’t know Pr Ø  You can restart your actor as many times as you want ●  Can modify UCT to be an acting procedure Ø  Use it to explore the environment Goal  Start   s2   s4   s3   s4   c  =  1   c  =    100   c  =  10   c  =  20   c  =  1   0.2   0.8   0.5   0.5  s1   s6   execute a; observe s′