Introduction of Hidden Markov Model

Hidden Markov Model
Billy Yang
2017/05/10

Sequence based
application
• DNA sequence alignment

Sequence learning
• Many interesting application or advanced pattern are correlated
with previous data.
• sliding window based algorithm
• It could apply original machine learning algorithm easily
• How to control window length?
• Too small gives poor performance
• Too big is computationally unfeasible

Markov model
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
Transition probability
matrix
Markov property :
P(St+1|St,St-1,St-2,..,S0)=P(St+1|St)
For example:
P(Sunny|Rainy,Rainy,Rainy)=P(Sunny|Rainy)
FromTo Sunny Rainy
Sunny 0.7 0.3
Rainy 0.6 0.4

State is hidden
Observation is obvious

Hidden Markov model
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
Observation
Hidden
State
Emission probability
matrix
Rainy Sunny
Cold 0.6 0
Warm 0.3 0.5
Hot 0.1 0.5

Hidden Markov model
Sunny
Rainy
Sunny
Rainy
Sunny
Rainy
Cold Cold Cold
St-1 St St+1
Ot-1 Ot Ot+1
…..
Sunny
Rainy
S0
Cold
O0
Start …..

Hidden Markov model
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
P(O0=Warm)
=P(O0=Warm, S0=Rainy)+P(O0=Warm, S0=Sunny)
=P(S0=Rainy)P(O0=Warm|S0=Rainy)
+P(S0=Sunny)P(O0=Warm|S0=Sunny)
=0.4*0.3 + 0.6*0.5
P(O0=Warm, O1=Warm, S0=Rainy, S1=Sunny)
= P(S0=Rainy)P(O0=Warm|S0=Rainy)P(S1=Sunny|S0=Rainy)P(O1=Warm|S1=Sunny)
= 0.4*0.3*0.6*0.5

Problem definition
• Evaluation problem
• Given a HMM, finding the probability of observation sequence.
• Decoding problem
• Given a HMM, finding the sequence of hidden states that most probably
generated an observation sequence.
• Learning problem
• Given a observation sequence and a HMM with initial parameter, let the
HMM could evaluate the observation sequence as maximal probability as
possible
• If we provide the related sequence of hidden states, HMM can be trained
such as supervised learning.

• Evaluation problem
• User: !
• HMM: , , 25%
• Decoding problem
• User: !! !
• HMM: ! ,
• Learning problem
• : bot ?
• HMM: ! ><

Evaluation problem
• Exhaust search
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
P(O0=Warm, O1=Cold)
=P(O0=Warm, O1=Cold, S0=Rainy, S1=Rainy)
+P(O0=Warm, O1=Cold, S0=Rainy, S1=Sunny)
+P(O0=Warm, O1=Cold, S0=Sunny, S1=Rainy)
+P(O0=Warm, O1=Cold, S0=Sunny, S1=Sunny)
= 0.4*0.3*0.4*0.6
+ 0.4*0.3*0.6*0.0
+ 0.6*0.5*0.3*0.6
+ 0.6*0.5*0.7*0.0
Given N states and T times,
time complexity O(NT * T)
Finding probability of observation sequence
O0=Warm, O1=Cold

Evaluation problem
• Dynamic programing
P(O0=Warm, O1=Cold)
= P(O0=Warm, O1=Cold, S0=Rainy, S1=Rainy)
+P(O0=Warm, O1=Cold, S0=Rainy, S1=Sunny)
+P(O0=Warm, O1=Cold, S0=Sunny, S1=Rainy)
+P(O0=Warm, O1=Cold, S0=Sunny, S1=Sunny)
= 0.4*0.3*0.4*0.6 + 0.4*0.3*0.6*0.0
+ 0.6*0.5*0.3*0.6 + 0.6*0.5*0.7*0.0
= 0.4*0.3*(0.4*0.6 + 0.6*0.0)
+ 0.6*0.5*(0.3*0.6 + 0.7*0.0)
= P(O0=Warm, S0=Rainy) (0.4*0.6 + 0.6*0.0)
+P(O0=Warm, S0=Sunny) (0.3*0.6 + 0.7*0.0)
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
P(Ot+1=Warm, Ot=Cold, …)
= P(Ot=Cold,…,St=Rainy) (0.4*0.6 + 0.6*0.0)
+ P(Ot=Cold,…,St=Sunny) (0.3*0.6 + 0.7*0.0)time complexity O(N2 * T)

Decode problem
• Exhaust search
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
max(
P(O0=Warm, O1=Cold, S0=Rainy, S1=Rainy),
P(O0=Warm, O1=Cold, S0=Rainy, S1=Sunny),
P(O0=Warm, O1=Cold, S0=Sunny, S1=Rainy),
P(O0=Warm, O1=Cold, S0=Sunny, S1=Sunny)
)
=
max(
0.4*0.3*0.4*0.6,
0.4*0.3*0.6*0.0,
0.6*0.5*0.3*0.6,
0.6*0.5*0.7*0.0
)
= P(O0=Warm, O1=Cold, S0=Sunny, S1=Rainy)
Given observation sequence O0=Warm, O1=Cold,
Finding the state sequence
which could get maximal probability
The state sequence is S0=Sunny, S1=Rainy

Decode problem
• Viterbi algorithm(Dynamic programing)
Rainy
0.4
Sunny
0.6
0.6
0.4
0.3
0.7
WarmCold Hot
0.6 0.3 0.1
0.5 0.5
step1 O0=Warm O1=Cold
Sunny 0.3 0
Rainy 0.12 0.054/Sunny
max(0.3*0.3*0.6, 0.12*0.4*0.6)
step2 O0=Warm O1=Cold
Sunny 0.3 0
Rainy 0.12 0.054/Sunny

Unsupervised learning
Baum-Welch algorithm
Baum-Welch is one kind of maximum likelihood algorithm,
it doesn’t guarantee global maximum
ξt(i,j)= HMM0
t Si t+1 Sj
HMM0
observation sequence
γt(i)= HMM0
t Si
T(Si,Sj) = E(Ok,Sj) =
HMM0 HMM1
Evaluation problem observation sequence,
HMM1 HMM0 probability

Calculate
alpha( by forward algorithm)
beta( by backward algorithm)
Calculate Eplison
( transaction probability
from state i to state j in time t )
Calculate Gamma
(sum of transaction probability
at state i in time t )
Coverage
No
Begin
Yes
Finish
Expectation Step
Maximization Step

Apple
Supervised learning
Training data : 2015/1/1~2015/5/1
Unsupervised learning
Training data : 2015/1/1~2015/5/1

Introduction of Hidden Markov Model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction of Hidden Markov Model

Similar to Introduction of Hidden Markov Model (20)

More from Billy Yang

More from Billy Yang (6)

Recently uploaded

Recently uploaded (20)

Introduction of Hidden Markov Model