George F Luger
ARTIFICIAL INTELLIGENCE 6th edition
Structures and Strategies for Complex Problem Solving
Machine Learning: Probabilistic
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
13.0 Stochastic and dynamic Models of
Learning
13.1 Hidden Markov Models (HMMs)
13.2 Dynamic Bayesian Networks and
Learning
13.3 Stochastic Extensions to Reinforcement
Learning
13.4 Epilogue and References
13.5 Exercises
1
D E F I N I T I O N
HIDDEN MARKOV MODEL
A graphical model is called a hidden Markov model (HMM) if it is a
Markov model whose states are not directly observable but are hidden
by a further stochastic system interpreting their output. More formally,
given a set of states S = s1, s2, ..., sn, and given a set of state transition
probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of
observation likelihoods, O = pi(ot), each expressing the probability of
an observation ot (at time t) being generated by a state st.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 2
Figure 13.1 A state transition diagram of a hidden Markov model of two
states designed for the coin flipping problem. The a
ij
are
determined by the elements of the 2 x 2 transition matrix, A.
S
1
S
2
p(H) = b
1
p(H) = b
2
p(T) = 1 - b
1
p(T) = 1 - b
2
a
1 1
a
12
= 1 - a
1 1
a
21
= 1 - a
22
a
22
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 3
Figure 13.2 The state transition diagram for a three state hidden Markov
model of coin flipping. Each coin/state, S
i
, has its bias, b
i
.
S
1
S
2
S
3
a
1 1
a
12
a
21
a
13
a
31
a
22
a
33
a
32
a
23
p(T) = 1 - b
1
p(H) = b
1
p(H) = b
2
p(H) = b
2
p(H) = b
3
p(T) = 1 - b
3
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 4
a. b.
Figure 13.3. a. The hidden, S , and obser vable, O, states of the AR-HMM where
p(O t | S t , Ot -1 ).
b. The values of the hidden S t of the e xample AR-HMM:saf e, unsaf e,
and faulty.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 5
Figure 13.4 A selection of the real-time data across multiple time
periods, with one time slice e xpanded, f or the AR-HMM.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 6
Figure 13.5 The time-series data of Figure 13.4 processed b y a fast
Four ier transf orm on the frequency domain. This was
the data submitted to the AR-HMM f or each time per iod.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 7
Figure 13.6 An auto regressiv e factor ial HMM, where the observable state Ot,
at time t is dependent on m ultiple (S t) subprocess , Si
t, and O t-1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 8
Figure 13.7 A PFSM representing a set of phonetically related English
words. The probability of each word occurring is below that
word. Adapted from Jurasky and Martin (2008).
Start
#
n
n
n
n
iy t
iy
iy
iy
um
End
#
d
.48
.52
.89
.11
.36
.64
neat
.00013
need
.00056
new
.001
knee
.000024
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 9
Figure 13.8 A trace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows
report the maximum value for Viterbi on each word for each input value (top row).
Adapted from Jurafsky and Martin (2008).
Start = 1.0 # n iy # end
neat
.00013
2 paths 1.0 1.0 x .00013 = .00013 .00013 x 1.0 = .00013 .00013 x .52 = .000067
need
.00056
2 paths 1.0 1.0 x .00056 = .00056 .00056 x 1.0 = .00056 .00056 x .11 = .000062
new
.001
2 paths 1.0 1.0 x .001 = .001 .001 x .36 = .00036 .00036 x 1.0 = .00036
knee
.000024
1 path 1.0 1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024
Total best .00036
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 10
function Viterbi(Observations of length T, Probabilistic FSM)
begin
number := number of states in FSM
create probability matrix viterbi[R = N + 2, C = T + 2];
viterbi[0, 0] := 1.0;
for each time step (observation) t from 0 to T do
for each state si from i = 0 to number do
for each transition from si to sj in the Probabilistic FSM do
begin
new-count := viterbi[si, t] x path[si, sj] x p(sj | si);
if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1]))
then
begin
viterbi[si, t + 1] := new-count
append back-pointer [sj , t + 1] to back-pointer list
end
end;
return viterbi[R, C];
return back-pointer list
end.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 11
Figure 13.9 A DBN example of two time slices. The set Q of random
variables are hidden, the set O observed, t indicates time.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 12
Figure 13.10 A Bayesian belief net for the burglar alarm, earthquake,
burglary example.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 13
Figure 13.11 A Markov random field reflecting the potential functions of the
random variables in the BBN of Figure 13.10, together with the
two observations about the system.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 14
Figure 13.12. A learnable node L is added to the Markov random field of Figure
13.11. The Markov random field iterates across three time periods.
For simplicity, the EM iteration is only indicated at time 1.
L
u
D E F I N I T I O N
A MARKOV DECISION PROCESS, or MDP
A Markov Decision Process is a tuple <S, A, P, R> where:
S is a set of states, and
A is a set of actions.
pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes
action a Œ A from state st at time t, it results in state st+1 at time t+1. Since
the probability, pa Œ P is defined over the entire state-space of actions, it is
often represented with a transition matrix.
R(s) is the reward received by the agent when in state s.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 16
D E F I N I T I O N
A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP
A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where:
S is a set of states, and
A is a set of actions.
O is the set of observations denoting what the agent can see about its world.
Since the agent cannot directly observe its current state, the observations are
probabilistically related to the underlying actual state of the world.
pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent
executes action a from state st at time t, it results in an observation o that
leads to an underlying state st +1 at time t+1.
R(st, a, st+1) is the reward received by the agent when it executes action a in
state st and transitions to state st+1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 17
st st+1 at pa (st,s t+1) Ra(st,s t+1)
high high search a R_search
high low search 1 - a R_search
low high search 1 - b -3
low low search b Ρ_σεαρχη
ηιγη ηιγη ωαιτ 1 Ρ_ωαιτ
ηιγη λοω ωαιτ 0 Ρ_ωαιτ
λοω ηιγη ωαιτ 0 Ρ_ωαιτ
λοω λοω ωαιτ 1 Ρ_ωαιτ
λοω ηιγη ρεχηαργε 1 0
λοω λοω ρεχηαργε 0 0
rewards possible from the current state at.
nations of the current state, st, next state , st+1, the actions and
Table 13.1. Transition probabilities and expected rewards for the finite MDP of
the recycling robot example. Thetable contains all possible combi-
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 18
Figure 13.13. The transition graph for the recycling robot. Thestate nodes are
the large circles and the action nodes are the small black states.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 19

Artificial Intelligence

  • 1.
    George F Luger ARTIFICIALINTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 13.0 Stochastic and dynamic Models of Learning 13.1 Hidden Markov Models (HMMs) 13.2 Dynamic Bayesian Networks and Learning 13.3 Stochastic Extensions to Reinforcement Learning 13.4 Epilogue and References 13.5 Exercises 1
  • 2.
    D E FI N I T I O N HIDDEN MARKOV MODEL A graphical model is called a hidden Markov model (HMM) if it is a Markov model whose states are not directly observable but are hidden by a further stochastic system interpreting their output. More formally, given a set of states S = s1, s2, ..., sn, and given a set of state transition probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of observation likelihoods, O = pi(ot), each expressing the probability of an observation ot (at time t) being generated by a state st. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 2
  • 3.
    Figure 13.1 Astate transition diagram of a hidden Markov model of two states designed for the coin flipping problem. The a ij are determined by the elements of the 2 x 2 transition matrix, A. S 1 S 2 p(H) = b 1 p(H) = b 2 p(T) = 1 - b 1 p(T) = 1 - b 2 a 1 1 a 12 = 1 - a 1 1 a 21 = 1 - a 22 a 22 Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 3
  • 4.
    Figure 13.2 Thestate transition diagram for a three state hidden Markov model of coin flipping. Each coin/state, S i , has its bias, b i . S 1 S 2 S 3 a 1 1 a 12 a 21 a 13 a 31 a 22 a 33 a 32 a 23 p(T) = 1 - b 1 p(H) = b 1 p(H) = b 2 p(H) = b 2 p(H) = b 3 p(T) = 1 - b 3 Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 4
  • 5.
    a. b. Figure 13.3.a. The hidden, S , and obser vable, O, states of the AR-HMM where p(O t | S t , Ot -1 ). b. The values of the hidden S t of the e xample AR-HMM:saf e, unsaf e, and faulty. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 5
  • 6.
    Figure 13.4 Aselection of the real-time data across multiple time periods, with one time slice e xpanded, f or the AR-HMM. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 6
  • 7.
    Figure 13.5 Thetime-series data of Figure 13.4 processed b y a fast Four ier transf orm on the frequency domain. This was the data submitted to the AR-HMM f or each time per iod. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 7
  • 8.
    Figure 13.6 Anauto regressiv e factor ial HMM, where the observable state Ot, at time t is dependent on m ultiple (S t) subprocess , Si t, and O t-1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 8
  • 9.
    Figure 13.7 APFSM representing a set of phonetically related English words. The probability of each word occurring is below that word. Adapted from Jurasky and Martin (2008). Start # n n n n iy t iy iy iy um End # d .48 .52 .89 .11 .36 .64 neat .00013 need .00056 new .001 knee .000024 Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 9
  • 10.
    Figure 13.8 Atrace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows report the maximum value for Viterbi on each word for each input value (top row). Adapted from Jurafsky and Martin (2008). Start = 1.0 # n iy # end neat .00013 2 paths 1.0 1.0 x .00013 = .00013 .00013 x 1.0 = .00013 .00013 x .52 = .000067 need .00056 2 paths 1.0 1.0 x .00056 = .00056 .00056 x 1.0 = .00056 .00056 x .11 = .000062 new .001 2 paths 1.0 1.0 x .001 = .001 .001 x .36 = .00036 .00036 x 1.0 = .00036 knee .000024 1 path 1.0 1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024 Total best .00036 Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 10
  • 11.
    function Viterbi(Observations oflength T, Probabilistic FSM) begin number := number of states in FSM create probability matrix viterbi[R = N + 2, C = T + 2]; viterbi[0, 0] := 1.0; for each time step (observation) t from 0 to T do for each state si from i = 0 to number do for each transition from si to sj in the Probabilistic FSM do begin new-count := viterbi[si, t] x path[si, sj] x p(sj | si); if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1])) then begin viterbi[si, t + 1] := new-count append back-pointer [sj , t + 1] to back-pointer list end end; return viterbi[R, C]; return back-pointer list end. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 11
  • 12.
    Figure 13.9 ADBN example of two time slices. The set Q of random variables are hidden, the set O observed, t indicates time. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 12
  • 13.
    Figure 13.10 ABayesian belief net for the burglar alarm, earthquake, burglary example. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 13
  • 14.
    Figure 13.11 AMarkov random field reflecting the potential functions of the random variables in the BBN of Figure 13.10, together with the two observations about the system. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 14
  • 15.
    Figure 13.12. Alearnable node L is added to the Markov random field of Figure 13.11. The Markov random field iterates across three time periods. For simplicity, the EM iteration is only indicated at time 1. L u
  • 16.
    D E FI N I T I O N A MARKOV DECISION PROCESS, or MDP A Markov Decision Process is a tuple <S, A, P, R> where: S is a set of states, and A is a set of actions. pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes action a Œ A from state st at time t, it results in state st+1 at time t+1. Since the probability, pa Œ P is defined over the entire state-space of actions, it is often represented with a transition matrix. R(s) is the reward received by the agent when in state s. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 16
  • 17.
    D E FI N I T I O N A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where: S is a set of states, and A is a set of actions. O is the set of observations denoting what the agent can see about its world. Since the agent cannot directly observe its current state, the observations are probabilistically related to the underlying actual state of the world. pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent executes action a from state st at time t, it results in an observation o that leads to an underlying state st +1 at time t+1. R(st, a, st+1) is the reward received by the agent when it executes action a in state st and transitions to state st+1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 17
  • 18.
    st st+1 atpa (st,s t+1) Ra(st,s t+1) high high search a R_search high low search 1 - a R_search low high search 1 - b -3 low low search b Ρ_σεαρχη ηιγη ηιγη ωαιτ 1 Ρ_ωαιτ ηιγη λοω ωαιτ 0 Ρ_ωαιτ λοω ηιγη ωαιτ 0 Ρ_ωαιτ λοω λοω ωαιτ 1 Ρ_ωαιτ λοω ηιγη ρεχηαργε 1 0 λοω λοω ρεχηαργε 0 0 rewards possible from the current state at. nations of the current state, st, next state , st+1, the actions and Table 13.1. Transition probabilities and expected rewards for the finite MDP of the recycling robot example. Thetable contains all possible combi- Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 18
  • 19.
    Figure 13.13. Thetransition graph for the recycling robot. Thestate nodes are the large circles and the action nodes are the small black states. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 19