1999 hidden markov models

' HMM Tutorial c Tapas Kanungo 1
$
Hidden Markov Models

Tapas Kanungo
Center for Automation Research
University of Maryland
Web: www.cfar.umd.edu ~kanungo
Email: kanungo@cfar.umd.edu

&

$
Outline

1. Markov models
2. Hidden Markov models
3. Forward Backward algorithm
4. Viterbi algorithm
5. Baum-Welch estimation algorithm

$
Markov Models

Observable states:
1; 2; : : : ; N

Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t T

First order Markov assumption:
P q = j jq ,1 = i; q ,2 = k; : : : = P q = j jq ,1 = i
t t t t t

Stationarity:
P q = j jq ,1 = i = P q + = j jq + ,1 = i
t t t l t l

$
Markov Models

State transition matrix A :
2 3
6 a11 a12 a1
6 a1 7
7
6
6
j N
7
7
6 7
6 21 a22 a2
6 a
6 a2 7
7
7
6 j N
7
6 .
6
6 . .. .. .. 7
7
6 7
7
A=6 6
6
7
7
7
6 1 a 2 a
6 a
6 a 7
7
7
6 i i ij iN
7
6 .
6
6 . .. .. .. 7
7
6
6
6
7
7
7
7
6 7
1 a 2 a
4a a 5
N N Nj NN

where
a = P q = j jq ,1 = i
ij t t 1 i; j; N

Constraints on a : ij

a 0; ij 8i; j
X
a = 1; 8i
N

=1
ij
j

$
Markov Models: Example

States:
1. Rainy R
2. Cloudy C
3. Sunny S
State transition probability matrix:
2 3
6 0:4
6 0:3 0:3 7
7
6
6 7
7
A = 6 0:2
6
6
6 0:6
7
0:2 7
7
7
6
6 7
7
4 5
0:1 0:1 0:8
Compute the probability of
observing SSRRSCS given that today is S .

$

Basic conditional probability rule:
P A; B = P AjB P B

The Markov chain rule:
P q1; q2; : : : ; q
T

= P q jq1; q2 ; : : : ; q ,1P q1 ; q2; : : : ; q ,1
T T T

= P q jq ,1P q1 ; q2; : : : ; q ,1
T T T

= P q jq ,1P q ,1jq ,2P q1 ; q2; : : : ; q ,2
T T T T T

= P q jq ,1P q ,1jq ,2 P q2 jq1 P q1
T T T T

$

Observation sequence O :
O = S; S; S; R; R; S; C; S
Using the chain rule we get:
P Ojmodel
= P S; S; S; R; R; S; C; S jmodel
= P S P S jS P S jS P RjS P RjR
P S jRP C jS P S jC
= 3a33 a33 a31a11 a13 a32 a23
= 10:82 0:10:40:30:10:2
= 1:536 10,4

The prior probability = P q1 = i
i

$

What is the probability that the sequence re-
mains in state i for exactly d time units?
i 6
p d = P q1 = i; q2 = i; : : : ; q = i; q +1 = i; : : :d d

= a ,1 1 , a
i ii
d
ii

Exponential Markov chain duration density.
What is the expected value of the duration d in
state i?
1
X

d = i dp d
i
=1
1
d

X
= da ,11 , a
ii
d
ii
=1
1
d

X
= 1 , a ii da ,1 ii
d

=1
@ X a
d

1
= 1 , a d

@a 0=1 1ii ii

@ a A
ii d

= 1 , a @ ii

@a 1 , aii

1
ii ii

=
1,a

ii

$

Avg. number of consecutive sunny days =
1 1
= =5
1 , a33 1 , 0:8
Avg. number of consecutive cloudy days = 2.5
Avg. number of consecutive rainy days = 1.67

$
Hidden Markov Models

States are not observable
Observations are probabilistic functions of state
State transitions are still probabilistic

$
Urn and Ball Model

N urns containing colored balls
M distinct colors of balls
Each urn has a possibly di erent distribution
of colors
Sequence generation algorithm:
1. Pick initial urn according to some random
process.
2. Randomly pick a ball from the urn and then
replace it
3. Select another urn according a random selec-
tion process associated with the urn
4. Repeat steps 2 and 3

$
The Trellis

N
STATES

4
3
2
1
1 2 t-1 t t+1 t+2 T-1 T
TIME
o1 o2 o t-1 ot o t+1 o t+2 o T-1 oT

OBSERVATION

$
Elements of Hidden Markov
Models

N the number of hidden states
Q set of states Q = f1; 2; : : : ; N g
M the number of symbols
V set of symbols V = f1; 2; : : : ; M g
A the state-transition probability matrix.
a = P q +1 = j jq = i 1 i; j; N
ij t t

B Observation probability distribution:
B k = P o = kjq = j 1 k M
j t t

the initial state distribution:
= P q1 = i 1 i N
i

the entire model = A; B;

$
Three Basic Problems

1. Given observation O = o1; o2; : : : ; o and model
T

= A; B; ; e ciently compute P Oj:
Hidden states complicate the evaluation
Given two models 1 and 2, this can be used
to choose the better one.
2. Given observation O = o1; o2; : : : ; o and model
T

nd the optimal state sequence q = q1; q2; : : : ; q : T

Optimality criterion has to be decided e.g.
maximum likelihood
Explanation for the data.
3. Given O = o1; o2; : : : ; o ; estimate model parame-
T

ters = A; B; that maximize P Oj:

$
Solution to Problem 1

Problem: Compute P o1; o2; : : : ; o j T

Algorithm:
Let q = q1; q2; : : : ; q be a state sequence.
T

Assume the observations are independent:
Y
P Ojq; = P o jq ;
T

=1
t t
i

= b o1 b o2 b T o
q1 q2 q T

Probability of a particular state sequence is:
P qj = a a a T ,
q1 q1 q2 q2 q3 q 1 qT

Also, P O; qj = P Ojq; P qj
Enumerate paths and sum probabilities:
P Oj = X P Ojq; P qj
q

N state sequences and OT calculations.
T

Complexity: OTN calculations.

T

$
Forward Procedure: Intuition

N
aNk
STATES

a3k
3 k
2
a1K
1

t t+1
TIME

$
Forward Algorithm

De ne forward variable i as: t

t i = P o1 ; o2; : : : ; o ; q = ij
t t

t is the probability of observing the partial
i
sequence o1; o2 : : : ; o such that the state q is i.
t t

Induction:
1. Initialization: 1i = b o1 i i

2. Induction:
2 3
X
+1j = 4 ia 5b o +1
N

=1
t t ij j t
i

3. Termination:
X
P Oj = i
N

=1
T
i

Complexity: ON 2T :

$
Example

Consider the following coin-tossing experiment:
State 1 State 2 State 3
PH 0.5 0.75 0.25
PT 0.5 0.25 0.75
state-transition probabilities equal to 1 3
initial state probabilities equal to 1 3

1. You observe O = H; H; H; H; T; H; T; T; T; T : What
state sequence, q; is most likely? What is the
joint probability, P O; qj; of the observation se-
quence and the state sequence?
2. What is the probability that the observation se-
quence came entirely of state 1?

$
3. Consider the observation sequence
~
O = H; T; T; H; T; H; H; T; T; H :
How would your answers to parts 1 and 2 change?
4. If the state transition probabilities were:
2 3
6 0:9
6 0:45 0:45 7
7
6 7
0=6
A 6 0:05
7
7
6
6
6 0:1 0:45 7 ;
7
7
6
6 7
7
4 5
0:05 0:45 0:1
how would the new model 0 change your answers
to parts 1-3?

$
Backward Algorithm

N 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
aiN
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
111111
000000
1111111
0000000
111111
000000
1111111
0000000
111111
000000
STATES

1111111
0000000
111111
000000
111111
000000
1111111
0000000
1111111
0000000
111111
000000
111111
000000
1111111
0000000
1111111
0000000
111111
000000
i0000000
1111111
111111
000000
4 1111111
0000000
111111
000000
1111111
0000000
111111
000000
1111111
0000000
1111111
0000000
1111111
0000000
3 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
ai1
1111111
0000000
2 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1 1111111
0000000
1111111
0000000
1 2 t-1 t t+1 t+2 T-1 T
TIME

OBSERVATION

$
Backward Algorithm

De ne backward variable i as: t

t i = P o +1; o +2; : : : ; o jq = i;
t t T t

t is the probability of observing the partial
i
sequence o +1; o +2 : : : ; o such that the state q is
t t T t

i.
Induction:
1. Initialization: i = 1 T

2. Induction:
X
i = a b o +1 +1j ;
N

=1
t ij j t t
j

1 i N;
t = T , 1; : : : ; 1

$

Choose the most likely path
Find the path q1; q ; : : : ; q that maximizes the
t T

likelihood:
P q1; q2; : : : ; q jO; T

Solution by Dynamic Programming
De ne:
t i = 1 maxt,1 P q1 ; q2; : : : ; q = i; o1; o2 ; : : : ; o j
2
q ;q ;:::;q
t t

t i is the highest prob. path ending in state i
By induction we have:
t+1j = max i
t ia b o +1
ij j t

$
Viterbi Algorithm

N 1111111
0000000
1111111
0000000
aNk
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000 k
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
STATES

1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
4 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
3 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
a1k
1111111
0000000
2 1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1 1111111
0000000
1111111
0000000
1 2 t-1 t t+1 t+2 T-1 T
TIME

OBSERVATION

$
Viterbi Algorithm

Initialization:
1 i = b o1;
i i 1iN
1i = 0
Recursion:
tj = 1max ,1ia b o
i N
t ij j t

j = arg 1max ,1ia
t
i N
t ij

2 t T; 1 j N

Termination:
P = 1max i
i N
T

q = arg 1max i
T
i N
T

Path state sequence backtracking:
q = +1q+1; t = T , 1; T , 2; : : : ; 1

t t t

$

Estimate = A; B; to maximize P Oj
No analytic method because of complexity it-
erative solution.
Baum-Welch Algorithm:
1. Let initial model be 0:
2. Compute new based on 0 and observation
O:
3. If log P Oj , log P Oj0 DELTA stop.
4. Else set 0 and goto step 2.

$
Baum-Welch: Preliminaries

De ne i; j as the probability of being in state
i at time t and in state j at time t + 1:
ia b o +1 +1j
i; j = t ij

P Oj
j t t

ia b o +1 +1j
= P P t ij j t t

=1 =1 ia b o +1 +1j
N N
i j t ij j t t

De ne i as probability of being in state i at
t

time t; given the observation sequence.
X
i = i; j
N

=1
t t
j

PT
t =1 tis the expected number of times state i
i
is visited.
P ,1
T
t =1 tis the expected number of transitions
i; j
from state i to state j:

$
Baum-Welch: Update Rules

= expected frequency in state i at time t = 1
i

= 1i:
a = expected number of transition from state i
ij

to state j expected nubmer of transitions from
state i: P
a = P i;ij
ij
t

t

k = expected number of
b
j times in state j and
observing symbol k expected number of times
in state j : P
k = t =k j
bj
t;o
P
t

j
t

$
Properties

Covariance of the estimated parameters
Convergence rates

$
Types of HMM

Continuous density
Ergodic
State duration

$
Implementation Issues

Scaling
Initial parameters
Multiple observation

$
Comparison of HMMs

What is a natural distance function?
If 1; 2 is large, does it mean that the models
are really di erent?

1999 hidden markov models

Recommended

Recommended

More Related Content

Similar to 1999 hidden markov models

Similar to 1999 hidden markov models (8)

More from George Ang

More from George Ang (20)

1999 hidden markov models