Hidden Markov Models

6,367 views
6,102 views

Published on

An introduction of HMM and related tasks.

Published in: Education, Technology
3 Comments
20 Likes
Statistics
Notes
  • thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @blade10147 Hi, thanks for your interests. There is a refined version of this slide which can be downloaded. it is here: http://www.slideshare.net/phvu/markov-models
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • thanks couldya leave the save option next time i wanted to get the thing printed to use as notes
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,367
On SlideShare
0
From Embeds
0
Number of Embeds
1,082
Actions
Shares
0
Downloads
0
Comments
3
Likes
20
Embeds 0
No embeds

No notes for slide

Hidden Markov Models

  1. 1. MACHINE LEARNING Hidden Markov Models VU H. Pham phvu@fit.hcmus.edu.vn Department of Computer Science Dececmber 6th, 201006/12/2010 Hidden Markov Models 1
  2. 2. Contents• Introduction• Markov Chain• Hidden Markov Models 06/12/2010 Hidden Markov Models 2
  3. 3. Introduction• Markov processes are first proposed by a Russian mathematician Andrei Markov – He used these processes to investigate Pushkin’s poem.• Nowaday, Markov property and HMMs are widely used in many domains: – Natural Language Processing – Speech Recognition – Bioinformatics – Image/video processing – ... 06/12/2010 Hidden Markov Models 3
  4. 4. Markov Chain• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } Current state N=3 t=0 qt = q0 = s3 06/12/2010 Hidden Markov Models 4
  5. 5. Markov Chain• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in Current state exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN }• Between each timestep, the next state is chosen randomly. N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 5
  6. 6. p ( s1 ˚ s2 ) = 1 2Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• Has N states, called s1, s2, ..., sN• There are discrete timesteps, t=0, s2 t=1,... s1• On the t’th timestep the system is in exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0• Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0• The current state determines the probability for the next state. N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 6
  7. 7. p ( s1 ˚ s2 ) = 1 2Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• Has N states, called s1, s2, ..., sN 1/2• There are discrete timesteps, t=0, s2 1/2 t=1,... s1 2/3• On the t’th timestep the system is in 1/3 1 exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0• Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0• The current state determines the probability for the next state. N=3 – Often notated with arcs between states t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 7
  8. 8. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 8
  9. 9. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 9
  10. 10. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3• How to represent the joint p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 10
  11. 11. p ( s1 ˚ s2 ) = 1 2Markov Property p ( s2 ˚ s2 ) = 1 2 q0p ( s 3 ˚ s2 ) = 0• qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2• In other words: q1 s1 1/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 q2 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3• How to represent the joint q3 p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 qt = q1 = s2 06/12/2010 Hidden Markov Models 11
  12. 12. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3 06/12/2010 Hidden Markov Models 12
  13. 13. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) 06/12/2010 Hidden Markov Models 13
  14. 14. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )• The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 06/12/2010 s3 Hidden Markov Models Transition probabilities 14
  15. 15. Markov chain• So, the chain of {qt} is called Markov chain q0 q1 q2 q3• Each qt takes value from the finite state-space {s1, s2, s3}• Each qt is observed at a discrete timestep t• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )• The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 06/12/2010 s3 Hidden Markov Models Transition probabilities 15
  16. 16. Markov Chain – Important property• In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 06/12/2010 Hidden Markov Models 16
  17. 17. Markov Chain – Important property• In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1• Why? m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states ) j =1 m = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 Due to the Markov property 06/12/2010 Hidden Markov Models 17
  18. 18. Markov Chain: e.g.• The state-space of weather: rain wind cloud 06/12/2010 Hidden Markov Models 18
  19. 19. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 06/12/2010 Hidden Markov Models 19
  20. 20. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day. 06/12/2010 Hidden Markov Models 20
  21. 21. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day.• We have observed the weather in a week: rain wind rain rain cloudDay: 0 1 2 3 4 06/12/2010 Hidden Markov Models 21
  22. 22. Markov Chain: e.g.• The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0• Markov assumption: weather in the t+1’th day is depends only on the t’th day.• We have observed the weather in a week: Markov Chain rain wind rain rain cloudDay: 0 1 2 3 4 06/12/2010 Hidden Markov Models 22
  23. 23. Contents• Introduction• Markov Chain• Hidden Markov Models 06/12/2010 Hidden Markov Models 23
  24. 24. Modeling pairs of sequences• In many applications, we have to model pair of sequences• Examples: – POS tagging in Natural Language Processing (assign each word in a sentence to Noun, Adj, Verb...) – Speech recognition (map acoustic sequences to sequences of words) – Computational biology (recover gene boundaries in DNA sequences) – Video tracking (estimate the underlying model states from the observation sequences) – And many others... 06/12/2010 Hidden Markov Models 24
  25. 25. Probabilistic models for sequence pairs• We have two sequences of random variables: X1, X2, ..., Xm and S1, S2, ..., Sm• Intuitively, in a pratical system, each Xi corresponds to an observation and each Si corresponds to a state that generated the observation.• Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o}• How do we model the joint distribution: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) 06/12/2010 Hidden Markov Models 25
  26. 26. Hidden Markov Models (HMMs)• In HMMs, we assume that p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm ) m m = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j ) j =2 j =1• This is often called Independence assumptions in HMMs• We are gonna prove it in the next slides 06/12/2010 Hidden Markov Models 26
  27. 27. Independence Assumptions in HMMs [1] p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C )• By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ,..., S m = sm ) × p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )• Assumption 1: the state sequence forms a Markov chain m p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) j =2 06/12/2010 Hidden Markov Models 27
  28. 28. Independence Assumptions in HMMs [2]• By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., Sm = sm ) m = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) j =1• Assumption 2: each observation depends only on the underlying state p ( X j = x j ˚ S1 = s1 ,..., S m = sm , X 1 = x1 ,..., X j −1 = x j −1 ) = p( X j = xj ˚ S j = sj )• These two assumptions are often called independence assumptions in HMMs 06/12/2010 Hidden Markov Models 28
  29. 29. The Model form for HMMs• The model takes the following form: m m p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j ) j =2 j =1• Parameters in the model: – Initial probabilities π ( s ) for s ∈ {1, 2,..., k } – Transition probabilities t ( s ˚ s′ ) for s, s ∈ {1, 2,..., k } – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k } and x ∈ {1, 2,.., o} 06/12/2010 Hidden Markov Models 29
  30. 30. 6 components of HMMs start• Discrete timesteps: 1, 2, ...• Finite state space: {si} π1 π2 π3 t31• Events {xi} t11 t12 t23 π• Vector of initial probabilities {πi} s1 t21 s2 t32 s3• Matrix of transition probabilities e13 e11 e23 e33 T = {tij} = { p(sj|si) } e31 e22• Matrix of emission probabilities E = {eij} = { p(xj|si) } x1 x2 x3 The observations at continuous timesteps form an observation sequence {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo} 06/12/2010 Hidden Markov Models 30
  31. 31. 6 components of HMMs start• Given a specific HMM and an observation sequence, the π1 π2 π3 corresponding sequence of states t31 t11 is generally not deterministic t12 t23• Example: s1 t21 s2 t32 s3 Given the observation sequence: e13 e11 e23 e33 {x1, x3, x3, x2} e31 e22 The corresponding states can be any of following sequences: x1 x2 x3 {s1, s1, s2, s2} {s1, s2, s3, s2} {s1, s1, s1, s2} ... 06/12/2010 Hidden Markov Models 31
  32. 32. Here’s an HMM 0.2 0.5 0.5 0.6 s1 0.4 s2 0.8 s3 0.3 0.7 0.9 0.8 0.2 0.1 x1 x2 x3 T s1 s2 s3 E x1 x2 x3 π s1 s2 s3 s1 0.5 0.5 0 s1 0.3 0 0.7 0.3 0.3 0.4 s2 0.4 0 0.6 s2 0 0.1 0.9 s3 0.2 0.8 0 s3 0.2 0 0.806/12/2010 Hidden Markov Models 32
  33. 33. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O)• Most likely expaination (inference) – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 33
  34. 34. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the probability of• Most likely expaination (inference) observing the sequence O over all of possible sequences. – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 34
  35. 35. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the best• Most likely expaination (inference) corresponding state sequence, given an observation – Given: Φ, the observation O = {o1, o2,..., ot} sequence. – Goal: Q* = argmaxQ p(Q|O)• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 35
  36. 36. Three famous HMM tasks• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:• Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} Given an (or a set of) – Goal: p(O|Φ), or equivalently p(st = Si|O) observation sequence and• Most likely expaination (inference) corresponding state sequence, – Given: Φ, the observation O = {o1, o2,..., ot} estimate the Transition matrix, – Goal: Q* = argmaxQ p(Q|O) Emission matrix and initial probabilities of the HMM• Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 06/12/2010 Hidden Markov Models 36
  37. 37. Three famous HMM tasks Problem Algorithm Complexity State estimation Forward-Backward O(TN2) Calculating: p(O|Φ) Inference Viterbi decoding O(TN2) Calculating: Q*= argmaxQp(Q|O) Learning Baum-Welch (EM) O(TN2) Calculating: Φ* = argmaxΦp(O|Φ) T: number of timesteps N: number of states06/12/2010 Hidden Markov Models 37

×