This document discusses probabilistic planning domains and solutions. It introduces the concept of actions having probabilistic outcomes in a probabilistic planning domain. Solutions to stochastic shortest path problems must be either safe, with a probability of 1 of reaching the goal, or unsafe, with a probability between 0 and 1. Both acyclic and cyclic safe policies are possible, while unsafe policies can get stuck in implicit or explicit dead ends with some non-zero probability of failing to reach the goal. Examples of different types of policies are provided to illustrate safe versus unsafe solutions.
Efficient Scalar Multiplication for Ate Based Pairing over KSS Curve of Embed...Md. Al-Amin Khandaker Nipu
Efficiency of the next generation pairing based security pro- tocols rely not only on the faster pairing calculation but also on efficient scalar multiplication on higher degree rational points. In this paper we proposed a scalar multiplication technique in the context of Ate based pairing with Kachisa-Schaefer-Scott (KSS) pairing friendly curves with embedding degree k = 18 at the 192-bit security level. From the system- atically obtained characteristics p, order r and Frobenious trace t of KSS curve, which is given by certain integer z also known as mother parame- ter, we exploit the relation #E(Fp) = p+1−t mod r by applying Frobe- nius mapping with rational point to enhance the scalar multiplication. In addition we proposed z-adic representation of scalar s. In combination of Frobenious mapping with multi-scalar multiplication technique we ef- ficiently calculate scalar multiplication by s. Our proposed method can achieve 3 times or more than 3 times faster scalar multiplication com- pared to binary scalar multiplication, sliding-window and non-adjacent form method.
Solving connectivity problems via basic Linear Algebracseiitgn
Directed reachability and undirected connectivity are well studied problems in Complexity Theory. Reachability/Connectivity between distinct pairs of vertices through disjoint paths are well known but hard variations. We talk about recent algorithms to solve variants and restrictions of these problems in the static and dynamic settings by reductions to the determinant.
Efficient Scalar Multiplication for Ate Based Pairing over KSS Curve of Embed...Md. Al-Amin Khandaker Nipu
Efficiency of the next generation pairing based security pro- tocols rely not only on the faster pairing calculation but also on efficient scalar multiplication on higher degree rational points. In this paper we proposed a scalar multiplication technique in the context of Ate based pairing with Kachisa-Schaefer-Scott (KSS) pairing friendly curves with embedding degree k = 18 at the 192-bit security level. From the system- atically obtained characteristics p, order r and Frobenious trace t of KSS curve, which is given by certain integer z also known as mother parame- ter, we exploit the relation #E(Fp) = p+1−t mod r by applying Frobe- nius mapping with rational point to enhance the scalar multiplication. In addition we proposed z-adic representation of scalar s. In combination of Frobenious mapping with multi-scalar multiplication technique we ef- ficiently calculate scalar multiplication by s. Our proposed method can achieve 3 times or more than 3 times faster scalar multiplication com- pared to binary scalar multiplication, sliding-window and non-adjacent form method.
Solving connectivity problems via basic Linear Algebracseiitgn
Directed reachability and undirected connectivity are well studied problems in Complexity Theory. Reachability/Connectivity between distinct pairs of vertices through disjoint paths are well known but hard variations. We talk about recent algorithms to solve variants and restrictions of these problems in the static and dynamic settings by reductions to the determinant.
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Yandex
It is proved that any subset of Z/pZ, p is a prime number, having small Wiener norm (l_1-norm of its Fourier transform) contains a subset which is close to be an arithmetic progression. We apply the obtained results to get some progress in so-called Littlewood conjecture in Z/pZ as well as in a quantitative version of Beurling-Helson theorem.
Playing Atari with Deep Reinforcement Learning郁凱 黃
Playing Atari with Deep Reinforcement Learning
- Author: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
- Origin: https://arxiv.org/abs/1312.5602
- Related: https://github.com/number9473/nn-algorithm/issues/250
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
Multidimensional time series appear in many fields of application. Sometimes, it can be useful to use PCA to reach dimensionality reduction. However, formal inference procedures on PC rely on the independence of the variables. Therefore, several PC-like techniques, as Singular Spectrum Analysis, are used to attain this reduction by decomposing the original series into a sum of a small number of interpretable components. Here, SSA and its extension are described and applied to real datasets.
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Yandex
It is proved that any subset of Z/pZ, p is a prime number, having small Wiener norm (l_1-norm of its Fourier transform) contains a subset which is close to be an arithmetic progression. We apply the obtained results to get some progress in so-called Littlewood conjecture in Z/pZ as well as in a quantitative version of Beurling-Helson theorem.
Playing Atari with Deep Reinforcement Learning郁凱 黃
Playing Atari with Deep Reinforcement Learning
- Author: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
- Origin: https://arxiv.org/abs/1312.5602
- Related: https://github.com/number9473/nn-algorithm/issues/250
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
Multidimensional time series appear in many fields of application. Sometimes, it can be useful to use PCA to reach dimensionality reduction. However, formal inference procedures on PC rely on the independence of the variables. Therefore, several PC-like techniques, as Singular Spectrum Analysis, are used to attain this reduction by decomposing the original series into a sum of a small number of interpretable components. Here, SSA and its extension are described and applied to real datasets.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Yuri Boykov — Combinatorial optimization for higher-order segmentation functi...Yandex
This talk discusses recent advances in image segmentation in the context of higher-order appearance and smoothness functionals. The most basic segmentation energies combine unary terms enforcing certain color appearance models and pair-wise terms enforcing boundary smoothness. If color models are known, the corresponding binary optimization problem can be globally minimized. Estimation of color models leads to NP-hard mixed optimization problems that are typically solved with iterative block-coordinate descent (Zhu-Yuille, Chan-Vese, GrabCut, etc.) sensitive to initialization. This talk motivates higher-order appearance functionals (e.g. entropy and color-consistency) that do not require explicit estimation of segment appearance models. We show that in many cases such energies can be minimized globally. For example, our approach allows replacing iterative “grabcut” technique with a one cut method finding a global minimum. We also discuss a general Trust Region approach for approximate minimization of other high-order appearance terms. Time permitting, we will also motivate higher-order boundary smoothness terms (e.g. curvature) and describe the corresponding state-of-the-art combinatorial optimization techniques.
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
Proportional apportionment is the problem of assigning seats to parties according to their relative share of votes. Divisor methods are the de-facto standard solution, used in many countries.
In recent literature, there are two algorithms that implement divisor methods: one by Cheng and Eppstein (ISAAC, 2014) has worst-case optimal running time but is complex, while the other (Pukelsheim, 2014) is relatively simple and fast in practice but does not offer worst-case guarantees.
This talk presents the ideas behind a novel algorithm that avoids the shortcomings of both. We investigate the three contenders in order to determine which is most useful in practice.
Read more over here: http://reitzig.github.io/publications/RW2015b
Regularized Estimation of Spatial PatternsWen-Ting Wang
In climate and atmospheric research, many phenomena involve more than one meteorological spatial processes covarying in space. To understand how one process is affected by another, maximum covariance analysis (MCA) is commonly applied. However, the patterns obtained from MCA may sometimes be difficult to interpret. In this paper, we propose a regularization approach to promote spatial features in dominant coupled patterns by introducing smoothness and sparseness penalties while accounting for their orthogonalities. We develop an efficient algorithm to solve the resulting optimization problem by using the alternating direction method of multipliers. The effectiveness of the proposed method is illustrated by several numerical examples, including an application to study how precipitations in east Africa are affected by sea surface temperatures in the Indian Ocean.
On maximal and variational Fourier restrictionVjekoslavKovac1
Workshop talk slides, Follow-up workshop to trimester program "Harmonic Analysis and Partial Differential Equations", Hausdorff Institute, Bonn, May 2019.
On Steiner Dominating Sets and Steiner Domination Polynomials of PathsIJERA Editor
In this paper, we introduce a new concept of Steiner domination polynomial of a path Pn. The Steiner
domination polynomial of Pn is the polynomial Sγ(Pn, x)
1. 1
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
This
work
is
licensed
under
a
CreaBve
Commons
AEribuBon-‐NonCommercial-‐ShareAlike
4.0
InternaBonal
License.
Chapter
4
Delibera.on
with
Probabilis.c
Domain
Models
Dana S. Nau and Vikas Shivashankar
University of Maryland
2. 2
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Probabilis.c
Planning
Domain
● Actions have multiple possible outcomes
Ø Each outcome has a probability
● Several possible action representations
Ø Bayes nets, probabilistic operators, …
● Book doesn’t commit to any representation
Ø Only deals with the underlying semantics
● Σ = (S,A,γ,Pr,cost)
Ø S = set of states
Ø A = set of actions
Ø γ : S × A → 2S
Ø Pr(s′ | s, a) = probability of going to state s′ if we perform a in s
• Require Pr(s′ | s, a) > 0 for every s′ in γ(s,a)
Ø cost: S × A → +
• cost(s,a) = cost of action a in state s
3. 3
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Nota.on
from
Chapter
5
● Policy π : Sπ → A
Ø Sπ ⊆ S
Ø ∀s ∈ Sπ , π(s) ∈ Applicable(s)
● γ︎(s,π) = {s and all descendants of s reachable by π}
Ø the transitive closure of γ with π
● Graph(s,π) = rooted graph induced by π
Ø {nodes} = γ︎(s,π)
Ø {edges} = ∪a∈Applicable(s){(s,s′) | s′ ∈ γ(s,a)}
Ø root = s
● leaves(s,π) = {states in γ︎(s,π) that aren’t in Sπ}
4. 4
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Stochas.c
Systems
● Stochastic shortest path (SSP) problem: a triple (Σ, s0, Sg)
● Solution for (Σ, s0, Sg):
Ø policy π such that s0 ∈ Sπ and leaves(s0,π) ⋂ Sg ≠ ∅
● π is closed if π doesn’t stop at non-goal states unless no action is applicable
Ø for every state in γ︎(s,π), either
• s ∈ Sπ (i.e., π specifies an action at s)
• s ∈ Sg (i.e., s is a goal state)
• Applicable(s) = ∅ (i.e., there are no applicable actions at s)
5. 5
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
● Robot r1 starts
at location l1
Ø s0 = s1 in
the diagram
● Objective is to
get r1 to location l4
Ø Sg = {s4}
● π1 = {(s1,
move(r1,l1,l2)),
(s2,
move(r1,l2,l3)),
(s3,
move(r1,l3,l4))}
Ø Solution?
Ø Closed?
move(r1,l2,l1)
2
Policies
Goal
Start
6. 6
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
● History: a sequence of
states, starting at s0
σ = 〈s0, s1, s2, s3, …, sh〉
or (not in book):
σ = 〈s0, s1, s2, s3, …〉
● Let H(π) = {all histories
that can be produced by
following π from s0 to a
state in leaves(s0, π)}
● If σ ∈ H(π) then Pr (σ | π) = ∏i ≥ 0 Pr (si+1 | si ,π(si))
Ø Thus ∑σ ∈ Hπ Pr (σ | π) = 1
● Probability that π will stop at a goal state:
Ø Pr (Sg | π) = ∑ {Pr (σ | π) | σ ∈ H(π) and σ ends at a state in Sg}
move(r1,l2,l1)
2
Histories
Goal
Start
7. 7
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Unsafe
Solu.ons
● A solution π is unsafe if
Ø 0 < Pr (Sg | π) < 1
● Example:
π1 = {(s1,
move(r1,l1,l2)),
(s2,
move(r1,l2,l3)),
(s3,
move(r1,l3,l4))}
● H(π1) contains two histories:
Ø σ1 = 〈s1,
s2,
s3,
s4〉
Pr (σ1 | π1) = 1 × .8 × 1 = 0.8
Ø σ2 = 〈s1,
s2,
s5〉
Pr (σ2 | π1) = 1 × .2 = 0.2
● Pr (Sg | π) = 0.8
Explicit
dead
end
move(r1,l2,l1)
2
Goal
Start
11. 11
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
● Example:
π = {(s1,
move(r1,l1,l2)),
(s2,
move(r1,l2,l3)),
(s3,
move(r1,l3,l4)),
(s4,
move(r1,l4,l1)),
(s5,
move(r1,l5,l1))}
● What is Pr (Sg | π)?
Goal
move(r1,l2,l1)
2
Example
Start
12. 12
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
r
=
–100
Expected
Cost
Goal
Start
● cost(s,a) = cost of using a in s
● Example:
Ø cost(s,a) = 1 for each
“horizontal” action
Ø cost(s,a) = 100 for each
“vertical” action
● Cost of a history:
Ø Let σ = 〈s0, s1, … 〉 ∈ H(π)
Ø cost(σ | π) = ∑i ≥ 0 cost(si,π(si))
● Let π be a safe solution
● Expected cost of following π to a goal:
Ø Vπ(s) = 0 if s is a goal
Ø Vπ(s) = cost(s,π(s)) + ∑ ︎s′∈γ(s,π(s)) Pr (sʹ′|s, π(s)) Vπ(sʹ′) otherwise
● If s = s0 then
Ø Vπ(s0) = ∑σ ∈ H(π) cost(σ | π) Pr(σ | s0, π)
s
s1
s2
sn
…
π(s)
15. 15
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Planning
as
Op.miza.on
● Let π and π′ be safe solutions
● π dominates π′ if Vπ(s) ≤ Vπ′(s) for every state where both π and π′ are defined
Ø i.e., Vπ(s) ≤ Vπ′(s) for every s in Sπ ∩ Sπ′
● π is optimal if π dominates every safe solution π′
● V*(s) = min{Vπ(s) | π is a safe solution for which π(s) is defined}
= expected cost of getting from s to a goal using an
optimal safe solution
● Optimality principle (also called Bellman’s theorem):
Ø V*(s) = 0, if s is a goal
Ø V*(s) = mina∈Applicable(s){cost(s,a) + ∑ ︎s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′)}, otherwise
s
s1
s2
sn
…
π(s)
16. 16
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Policy
Itera.on
● Let (Σ,s0,Sg) be a safe SSP (i.e., Sg is reachable from every state)
● Let π be a safe solution that is defined at every state in S
● Let s be a state, and let a ∈ Applicable(s)
Ø Cost-to-go: expected cost at s if we start with a, and use π afterward
Ø Qπ(s,a) = cost(s,a) + ∑s′ ∈ γ(s,a) Pr (sʹ′|s,a) Vπ(sʹ′)
● For every s, let π′(s) ∈ argmina∈Applicable(s) Qπ(s,a)
Ø Then π′ is a safe solution and dominates π
● PI(Σ,s0,Sg,π0)
π ← π0
loop
compute Vπ (n equations and n unkowns, where n = |S|)
for every non-goal state s do
π′(s) ← any action in argmina∈Applicable(s) Qπ(s,a)
if π′ = π then return π
π ← π′
● Converges in a finite number of iterations
s
s1
s2
sn
…
π(s)
Tie-breaking rule: if
π(s) ∈ argmina∈Applicable(s) Qπ(s,a),
then use π′(s) = π(s)
19. 19
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Value
Itera.on
(Synchronous
Version)
● Let (Σ,s0,Sg) be a safe SSP
● Start with an arbitrary cost V(s) for each s and a small η > 0
VI(Σ,s0,Sg,V)
π ← ∅
loop
Vold ← V
for every non-goal state s do
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′ ∈ S Pr (sʹ′ | s,a) Vold(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
if maxs ∈ S ∖ Sg
|V(s) – Vold(s)| < η for every s then exit the loop
π(s) ← argmina∈Applicable(s) Q(s,a)
● |V′(s) – V(s)| is the residual of s
● maxs ∈ S ∖ Sg
|V′(s) – V(s)| is the residual
20. 20
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Goal
Start
Example
● aij = the action that moves from si to sj
Ø e.g., a12 = move(r1,l1,l2))
● η = 0.2
● V(s) = 0 for all s
Q(s1, a12) = 100 + 0 = 100
Q(s1, a14) = 1 + (½×0 + ½×0) = 1
min = 1
Q(s2, a21) = 100 + 0 = 100
Q(s2, a23) = 1 + (½×0 + ½×0) = 1
min = 1
Q(s3, a32) = 1 + 0 = 1
Q(s3, a34) = 100 + 0 = 100
min = 1
Q(s5, a52) = 1 + 0 = 1
Q(s5, a54) = 100 + 0 = 100
min = 1
residual = max(1–0, 1–0, 1–0, 1–0) = 1
23. 23
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Goal
Start
Example
● V(s1) = 13/4; V(s2) = 3; V(s3) = 3; V(s4) = 0; V(s5) = 3
Q(s1, a12) = 100 + 3 = 104
Q(s1, a14) = 1 + (½×13/4 + ½×0) = 17/8
min = 17/8
Q(s2, a21) = 100 + 13/4 = 1013/4
Q(s2, a23) = 1 + (½×3 + ½×3) = 4
min = 4
Q(s3, a32) = 1 + 3 = 4
Q(s3, a34) = 100 + 0 = 100
min = 4
Q(s5, a52) = 1 + 3 = 4
Q(s5, a54) = 100 + 0 = 100
min = 4
residual = max(17/8–13/4, 4–3, 4–3, 4–3) = 1
● How long before residual < η = 0.2?
● How long if the “vertical” actions cost
10 instead of 100?
24. 24
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Discussion
● Policy iteration computes an entire policy in each iteration,
and computes values based on that policy
Ø More work per iteration, because it needs to solve a set of simultaneous
equations
Ø Usually converges in a smaller number of iterations
● Value iteration computes new values in each iteration,
and chooses a policy based on those values
Ø In general, the values are not the values that one would get from the chosen
policy or any other policy
Ø Less work per iteration, because it doesn’t need to solve a set of equations
Ø Usually takes more iterations to converge
● What I showed you was the synchronous version of Value Iteration
• For each s, compute new values of Q and V using Vold
Ø Asynchronous version: compute new values of Q and V using V
• New values may depend on which nodes have already been updated
25. 25
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Value
Itera.on
● Synchronous version:
VI(Σ,s0,Sg,V)
π ← ∅
loop
Vold ← V
for every s ∈ S ∖ Sg do
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) +
∑sʹ′ ∈ S Pr (sʹ′|s,a) Vold(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
if maxs ∈ S ∖ Sg
|V(s) – Vold(s)| < η then
return π
● maxs ∈ S ∖ Sg
|V(s) – Vold(s)| is the residual
● |V(s) – Vold(s)| is the residual of s
● Asynchronous version:
VI(Σ,s0,Sg,V)
π ← ∅
loop
r ← 0 // the residual
for every s ∈ S ∖ Sg do
r ← max(r,Bellman-‐Update(s,V,π))
if r < η then return π
Bellman-‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
Start with an arbitrary cost V(s) for each s, and a small η > 0
26. 26
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Discussion
(Con.nued)
● For both, the number of iterations is polynomial in the number of states
Ø But the number of states is usually quite large
Ø In each iteration, need to examine the entire state space
● Thus, these algorithms can take huge amounts of time and space
● Use search techniques to avoid searching the entire space
27. 27
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
AO∗ (Σ,s0,Sg,h)
π ← ∅; V ← ∅
Envelope ← {s0} // all generated states
loop
if leaves(s0,π) ⊆ Sg then return π
select s ∈ leaves(s0,π) ∖ Sg
for all a ∈ Applicable(s)
for all s′ ∈ γ(s,a) ∖ Envelope do
V(s′) ← h(s′); add s′ to Envelope
AO-‐Update(s,V,π)
return π
AO-‐Update(s,V,π)
Z ← {s} // ancestors of updated nodes
while Z ≠ ∅ do
select any s ∈ Z such that γ(s,π(s)) ∩ Z = ∅
remove s from Z
Bellman-‐Update(s,V,π)
Z ← Z ∪ {s′ ∈ Sπ | s ∈ γ(s′,π(s′))}
● h is the heuristic function
Ø Must have h(s) = 0 for every s in Sg
Bellman-‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
Ø Example: h(s) = 0 for all s
Goal
Start
AO*
(requires
Σ
to
be
acyclic)
s2
s1
s4
s6
s3
s4
c
=
1
c
=
100
c
=
10
c
=
20
c
=
1
0.2
0.8
0.5
0.5
28. 28
Dana
Nau
and
Vikas
Shivashankar:
Lecture
slides
for
Automated
Planning
and
Ac0ng
Updated
4/24/15
Bellman-‐Update(s,V,π)
vold ← V(s)
for every a ∈ Applicable(s) do
Q(s,a) ← cost(s,a) + ∑sʹ′∈S Pr (sʹ′|s,a) V(sʹ′)
V(s) ← mina∈Applicable(s) Q(s,a)
π(s) ← argmina∈Applicable(s) Q(s,a)
return |V(s) – vold|
LAO*
(can
handle
cycles)
Goal
Start
LAO∗(Σ,s0,Sg,h)
π ← ∅; V ← ∅
Envelope ← {s0} // all generated states
loop
if leaves(s0,π) ⊆ Sg then return π
select s ∈ leaves(s0,π) ∖ Sg
for all a ∈ Applicable(s)
for all s′ ∈ γ(s,a) ∖ Envelope do
V(s′) ← h(s′); add s′ to Envelope
LAO-‐Update(s,V,π)
return π
LAO-‐Update(s,V,π)
Z ← {s} ∪ {s′ ∈ Sπ | s ∈ γ(s′,π)}
for every s ∈ Z do Bellman-‐Update(s,V,π)
loop
residual ←
max{Bellman-‐Update(s,V,π) | s ∈ Sπ)
if leaves(s0,π) changed or residual ≤ η then
break