Upcoming SlideShare
×

# Hidden Markov Models

6,367 views
6,102 views

Published on

An introduction of HMM and related tasks.

Published in: Education, Technology
20 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• thanks

Are you sure you want to  Yes  No
• @blade10147 Hi, thanks for your interests. There is a refined version of this slide which can be downloaded. it is here: http://www.slideshare.net/phvu/markov-models

Are you sure you want to  Yes  No
• thanks couldya leave the save option next time i wanted to get the thing printed to use as notes

Are you sure you want to  Yes  No
Views
Total views
6,367
On SlideShare
0
From Embeds
0
Number of Embeds
1,082
Actions
Shares
0
0
3
Likes
20
Embeds 0
No embeds

No notes for slide

### Hidden Markov Models

1. 1. MACHINE LEARNING Hidden Markov Models VU H. Pham phvu@fit.hcmus.edu.vn Department of Computer Science Dececmber 6th, 201006/12/2010 Hidden Markov Models 1
2. 2. Contents• Introduction• Markov Chain• Hidden Markov Models 06/12/2010 Hidden Markov Models 2
3. 3. Introduction• Markov processes are first proposed by a Russian mathematician Andrei Markov – He used these processes to investigate Pushkin’s poem.• Nowaday, Markov property and HMMs are widely used in many domains: – Natural Language Processing – Speech Recognition – Bioinformatics – Image/video processing – ... 06/12/2010 Hidden Markov Models 3
4. 4. Markov Chain• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } Current state N=3 t=0 qt = q0 = s3 06/12/2010 Hidden Markov Models 4
5. 5. Markov Chain• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in Current state exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN }• Between each timestep, the next state is chosen randomly. N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 5
6. 6. p ( s1 ˚ s2 ) = 1 2Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0• Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0• The current state determines the probability for the next state. N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 6
7. 7. p ( s1 ˚ s2 ) = 1 2Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• Has N states, called s1, s2, ..., sN 1/2• There are discrete timesteps, t=0, s2 1/2 t=1,... s1 2/3• On the t’th timestep the system is in 1/3 1 exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0• Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0• The current state determines the probability for the next state. N=3 – Often notated with arcs between states t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 7
8. 8. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 8
9. 9. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 9
10. 10. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3• How to represent the joint p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 10
11. 11. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 q0p ( s 3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: q1 s1 1/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 q2 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3• How to represent the joint q3 p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 11
12. 12. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3 06/12/2010 Hidden Markov Models 12
13. 13. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) 06/12/2010 Hidden Markov Models 13
14. 14. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )• The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 06/12/2010 s3 Hidden Markov Models Transition probabilities 14
15. 15. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )• The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 06/12/2010 s3 Hidden Markov Models Transition probabilities 15
16. 16. Markov Chain – Important property• In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 06/12/2010 Hidden Markov Models 16
17. 17. Markov Chain – Important property• In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1• Why? m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states ) j =1 m = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 Due to the Markov property 06/12/2010 Hidden Markov Models 17
18. 18. Markov Chain: e.g.• The state-space of weather: rain wind cloud 06/12/2010 Hidden Markov Models 18
19. 19. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 06/12/2010 Hidden Markov Models 19
20. 20. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day. 06/12/2010 Hidden Markov Models 20
21. 21. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day.• We have observed the weather in a week: rain wind rain rain cloudDay: 0 1 2 3 4 06/12/2010 Hidden Markov Models 21
22. 22. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day.• We have observed the weather in a week: Markov Chain rain wind rain rain cloudDay: 0 1 2 3 4 06/12/2010 Hidden Markov Models 22
23. 23. Contents• Introduction• Markov Chain• Hidden Markov Models 06/12/2010 Hidden Markov Models 23
24. 24. Modeling pairs of sequences• In many applications, we have to model pair of sequences• Examples: – POS tagging in Natural Language Processing (assign each word in a sentence to Noun, Adj, Verb...) – Speech recognition (map acoustic sequences to sequences of words) – Computational biology (recover gene boundaries in DNA sequences) – Video tracking (estimate the underlying model states from the observation sequences) – And many others... 06/12/2010 Hidden Markov Models 24
25. 25. Probabilistic models for sequence pairs• We have two sequences of random variables: X1, X2, ..., Xm and S1, S2, ..., Sm• Intuitively, in a pratical system, each Xi corresponds to an observation and each Si corresponds to a state that generated the observation.• Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o}• How do we model the joint distribution: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) 06/12/2010 Hidden Markov Models 25
26. 26. Hidden Markov Models (HMMs)• In HMMs, we assume that p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm ) m m = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j ) j =2 j =1• This is often called Independence assumptions in HMMs• We are gonna prove it in the next slides 06/12/2010 Hidden Markov Models 26
27. 27. Independence Assumptions in HMMs [1] p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C )• By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ,..., S m = sm ) × p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )• Assumption 1: the state sequence forms a Markov chain m p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) j =2 06/12/2010 Hidden Markov Models 27
28. 28. Independence Assumptions in HMMs [2]• By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., Sm = sm ) m = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) j =1• Assumption 2: each observation depends only on the underlying state p ( X j = x j ˚ S1 = s1 ,..., S m = sm , X 1 = x1 ,..., X j −1 = x j −1 ) = p( X j = xj ˚ S j = sj )• These two assumptions are often called independence assumptions in HMMs 06/12/2010 Hidden Markov Models 28
29. 29. The Model form for HMMs• The model takes the following form: m m p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j ) j =2 j =1• Parameters in the model: – Initial probabilities π ( s ) for s ∈ {1, 2,..., k } – Transition probabilities t ( s ˚ s′ ) for s, s ∈ {1, 2,..., k } – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k } and x ∈ {1, 2,.., o} 06/12/2010 Hidden Markov Models 29
30. 30. 6 components of HMMs start• Discrete timesteps: 1, 2, ...• Finite state space: {si} π1 π2 π3 t31• Events {xi} t11 t12 t23 π• Vector of initial probabilities {πi} s1 t21 s2 t32 s3• Matrix of transition probabilities e13 e11 e23 e33 T = {tij} = { p(sj|si) } e31 e22• Matrix of emission probabilities E = {eij} = { p(xj|si) } x1 x2 x3 The observations at continuous timesteps form an observation sequence {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo} 06/12/2010 Hidden Markov Models 30
31. 31. 6 components of HMMs start• Given a specific HMM and an observation sequence, the π1 π2 π3 corresponding sequence of states t31 t11 is generally not deterministic t12 t23• Example: s1 t21 s2 t32 s3 Given the observation sequence: e13 e11 e23 e33 {x1, x3, x3, x2} e31 e22 The corresponding states can be any of following sequences: x1 x2 x3 {s1, s1, s2, s2} {s1, s2, s3, s2} {s1, s1, s1, s2} ... 06/12/2010 Hidden Markov Models 31
32. 32. Here’s an HMM 0.2 0.5 0.5 0.6 s1 0.4 s2 0.8 s3 0.3 0.7 0.9 0.8 0.2 0.1 x1 x2 x3 T s1 s2 s3 E x1 x2 x3 π s1 s2 s3 s1 0.5 0.5 0 s1 0.3 0 0.7 0.3 0.3 0.4 s2 0.4 0 0.6 s2 0 0.1 0.9 s3 0.2 0.8 0 s3 0.2 0 0.806/12/2010 Hidden Markov Models 32
33. 33. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O)• Most likely expaination (inference) – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 33
34. 34. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the probability of• Most likely expaination (inference) observing the sequence O over all of possible sequences. – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 34
35. 35. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the best• Most likely expaination (inference) corresponding state sequence, given an observation – Given: Φ, the observation O = {o1, o2,..., ot} sequence. – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 35
36. 36. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} Given an (or a set of) – Goal: p(O|Φ), or equivalently p(st = Si|O) observation sequence and• Most likely expaination (inference) corresponding state sequence, – Given: Φ, the observation O = {o1, o2,..., ot} estimate the Transition matrix, – Goal: Q* = argmaxQ p(Q|O) Emission matrix and initial probabilities of the HMM• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 36
37. 37. Three famous HMM tasks Problem Algorithm Complexity State estimation Forward-Backward O(TN2) Calculating: p(O|Φ) Inference Viterbi decoding O(TN2) Calculating: Q*= argmaxQp(Q|O) Learning Baum-Welch (EM) O(TN2) Calculating: Φ* = argmaxΦp(O|Φ) T: number of timesteps N: number of states06/12/2010 Hidden Markov Models 37