SlideShare a Scribd company logo
1 of 50
MACHINE LEARNING

             Hidden Markov Models
                         VU H. Pham
                     phvu@fit.hcmus.edu.vn


                 Department of Computer Science

                      Dececmber 6th, 2010




08/12/2010             Hidden Markov Models       1
Contents
• Introduction

• Markov Chain

• Hidden Markov Models




 08/12/2010       Hidden Markov Models   2
Introduction
• Markov processes are first proposed by
   Russian mathematician Andrei Markov
    – He used these processes to investigate
        Pushkin’s poem.
• Nowaday, Markov property and HMMs are
   widely used in many domains:
    – Natural Language Processing
    – Speech Recognition
    – Bioinformatics
    – Image/video processing
    – ...

  08/12/2010                    Hidden Markov Models   3
Markov Chain
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                             s2
  t=1,...
                                                       s1
• On the t’th timestep the system is in
  exactly one of the available states.
                                                                              s3
  Call it qt ∈ {s1 , s2 ,..., sN }

                                                               Current state



                                                            N=3
                                                            t=0
                                                            q t = q 0 = s3
  08/12/2010                    Hidden Markov Models                           4
Markov Chain
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                             s2
  t=1,...
                                                       s1
• On the t’th timestep the system is in                      Current state

  exactly one of the available states.
                                                                              s3
  Call it qt ∈ {s1 , s2 ,..., sN }
• Between each timestep, the next
  state is chosen randomly.

                                                            N=3
                                                            t=1
                                                            q t = q 1 = s2
  08/12/2010                    Hidden Markov Models                          5
p ( s1 ˚ s2 ) = 1 2
Markov Chain                                                                  p ( s2 ˚ s2 ) = 1 2
                                                                              p ( s3 ˚ s2 ) = 0
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                                          s2
  t=1,...
                                                                  s1
• On the t’th timestep the system is in
  exactly one of the available states.
                                        p ( qt +1 = s1 ˚ qt = s1 ) = 0                     s3
  Call it qt ∈ {s1 , s2 ,..., sN }
                                                     p ( s2 ˚ s1 ) = 0
• Between each timestep, the next                    p ( s3 ˚ s1 ) = 1         p ( s1 ˚ s3 ) = 1 3
  state is chosen randomly.                                                    p ( s2 ˚ s3 ) = 2 3
                                                                               p ( s3 ˚ s3 ) = 0
• The current state determines the
  probability for the next state.                                        N=3
                                                                         t=1
                                                                         q t = q 1 = s2
  08/12/2010                      Hidden Markov Models                                      6
p ( s1 ˚ s2 ) = 1 2
Markov Chain                                                                    p ( s2 ˚ s2 ) = 1 2
                                                                                p ( s3 ˚ s2 ) = 0
• Has N states, called s1, s2, ..., sN                                      1/2
• There are discrete timesteps, t=0,
                                                                                           s2
                                                                          1/2
  t=1,...
                                                                  s1                           2/3
• On the t’th timestep the system is in                               1/3
                                                             1
  exactly one of the available states.
                                        p ( qt +1 = s1 ˚ qt = s1 ) = 0                       s3
  Call it qt ∈ {s1 , s2 ,..., sN }
                                                     p ( s2 ˚ s1 ) = 0
• Between each timestep, the next                    p ( s3 ˚ s1 ) = 1            p ( s1 ˚ s3 ) = 1 3
  state is chosen randomly.                                                       p ( s2 ˚ s3 ) = 2 3
                                                                                  p ( s3 ˚ s3 ) = 0
• The current state determines the
  probability for the next state.                                        N=3
    – Often notated with arcs between states
                                                                         t=1
                                                                         q t = q 1 = s2
  08/12/2010                      Hidden Markov Models                                         7
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
   = p ( qt +1 ˚ qt )                                     p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
                                                          p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
                                                                                            p ( s2 ˚ s3 ) = 2 3
                                                                                            p ( s3 ˚ s3 ) = 0

                                                                                  N=3
                                                                                  t=1
                                                                                  q t = q 1 = s2
 08/12/2010                            Hidden Markov Models                                              8
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
                                                                                            p ( s2 ˚ s3 ) = 2 3
                                                                                            p ( s3 ˚ s3 ) = 0

                                                                                  N=3
                                                                                  t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                              9
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
                                                                                            p ( s2 ˚ s3 ) = 2 3
• How to represent the joint                                                                p ( s3 ˚ s3 ) = 0
  distribution of (q0, q1, q2...) using
                                                                                  N=3
  graphical models?                                                               t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                             10
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                          p ( s2 ˚ s2 ) = 1 2

                                                                                    q0p ( s    3   ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                         s2
                                                                                   1/2
• In other words:                                                                   q1
                                                                       s1                                   1/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0
                                                                                     q2                    s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
• How to represent the joint                                                        q3 p ( s       2   ˚ s3 ) = 2 3
                                                                                           p ( s3 ˚ s3 ) = 0
  distribution of (q0, q1, q2...) using
                                                                                  N=3
  graphical models?                                                               t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                                11
Markov chain
• So, the chain of {qt} is called Markov chain
           q0      q1          q2                   q3




  08/12/2010                 Hidden Markov Models        12
Markov chain
• So, the chain of {qt} is called Markov chain
           q0           q1             q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )




  08/12/2010                         Hidden Markov Models                            13
Markov chain
• So, the chain of {qt} is called Markov chain
           q0                  q1                q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )
• The transition from qt to qt+1 is calculated from the transition
  probability matrix
             1/2
                                                   s1     s2     s3
                                    s2                                 s1        0       0        1
                   1/2
      s1                                                               s2        ½       ½        0
                                         2/3
               1
                         1/3                                           s3        1/3     2/3      0

  08/12/2010
                                     s3        Hidden Markov Models
                                                                            Transition probabilities
                                                                                                       14
Markov chain
• So, the chain of {qt} is called Markov chain
           q0                  q1                q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )
• The transition from qt to qt+1 is calculated from the transition
  probability matrix
             1/2
                                                   s1     s2     s3
                                    s2                                 s1        0       0        1
                   1/2
      s1                                                               s2        ½       ½        0
                                         2/3
               1
                         1/3                                           s3        1/3     2/3      0

  08/12/2010
                                     s3        Hidden Markov Models
                                                                            Transition probabilities
                                                                                                       15
Markov Chain – Important property
• In a Markov chain, the joint distribution is
                                                 m
              p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 )
                                                j =1




 08/12/2010                         Hidden Markov Models               16
Markov Chain – Important property
• In a Markov chain, the joint distribution is
                                                         m
                      p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 )
                                                        j =1



• Why?                                         m
              p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states )
                                               j =1
                                               m
                                  = p ( q0 ) ∏ p ( q j | q j −1 )
                                               j =1




                Due to the Markov property


 08/12/2010                                 Hidden Markov Models                         17
Markov Chain: e.g.
• The state-space of weather:

              rain            wind



                     cloud




 08/12/2010                  Hidden Markov Models   18
Markov Chain: e.g.
• The state-space of weather:
                           1/2                                        Rain   Cloud   Wind
              rain                      wind
                                                              Rain    ½      0       ½
                                 2/3                          Cloud   1/3    0       2/3
 1/2                 1/3                   1
                           cloud                              Wind    0      1       0




 08/12/2010                            Hidden Markov Models                                19
Markov Chain: e.g.
• The state-space of weather:
                           1/2                                        Rain   Cloud   Wind
              rain                      wind
                                                              Rain    ½      0       ½
                                 2/3                          Cloud   1/3    0       2/3
 1/2                 1/3                   1
                           cloud                              Wind    0      1       0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.




 08/12/2010                            Hidden Markov Models                                20
Markov Chain: e.g.
• The state-space of weather:
                                    1/2                                        Rain    Cloud   Wind
                  rain                           wind
                                                                       Rain    ½       0       ½
                                          2/3                          Cloud   1/3     0       2/3
 1/2                     1/3                          1
                                      cloud                            Wind    0       1       0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.
• We have observed the weather in a week:
        rain                   wind              rain                   rain          cloud

Day:          0                 1                 2                      3              4
 08/12/2010                                     Hidden Markov Models                                 21
Markov Chain: e.g.
• The state-space of weather:
                                    1/2                                        Rain    Cloud    Wind
                  rain                           wind
                                                                       Rain    ½       0        ½
                                          2/3                          Cloud   1/3     0        2/3
 1/2                     1/3                          1
                                      cloud                            Wind    0       1        0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.
• We have observed the weather in a week:                                                   Markov Chain

        rain                   wind              rain                   rain          cloud

Day:          0                 1                 2                      3              4
 08/12/2010                                     Hidden Markov Models                                  22
Contents
• Introduction

• Markov Chain

• Hidden Markov Models




 08/12/2010       Hidden Markov Models   23
Modeling pairs of sequences
• In many applications, we have to model pair of sequences
• Examples:
    – POS tagging in Natural Language Processing (assign each word in a
        sentence to Noun, Adj, Verb...)
    – Speech recognition (map acoustic sequences to sequences of words)
    – Computational biology (recover gene boundaries in DNA sequences)
    – Video tracking (estimate the underlying model states from the observation
        sequences)
    – And many others...




  08/12/2010                       Hidden Markov Models                     24
Probabilistic models for sequence pairs
• We have two sequences of random variables:
   X1, X2, ..., Xm and S1, S2, ..., Sm

• Intuitively, in a pratical system, each Xi corresponds to an observation
   and each Si corresponds to a state that generated the observation.

• Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o}

• How do we model the joint distribution:

               p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm )




  08/12/2010                           Hidden Markov Models              25
Hidden Markov Models (HMMs)
• In HMMs, we assume that
              p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm )
                               m                                 m
              = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j )
                               j =2                              j =1




• This is often called Independence assumptions in
  HMMs

• We are gonna prove it in the next slides

 08/12/2010                               Hidden Markov Models                                    26
Independence Assumptions in HMMs [1]
                                 p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C )
• By the chain rule, the following equality is exact:
          p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm )
         = p ( S1 = s1 ,..., S m = sm ) ×
          p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )

• Assumption 1: the state sequence forms a Markov chain
                                                           m
         p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 )
                                                           j =2




  08/12/2010                            Hidden Markov Models                                 27
Independence Assumptions in HMMs [2]
• By the chain rule, the following equality is exact:
               p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )
                  m
               = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 )
                  j =1

• Assumption 2: each observation depends only on the underlying
   state
                p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 )
                = p( X j = xj ˚ S j = sj )
• These two assumptions are often called independence
   assumptions in HMMs

  08/12/2010                              Hidden Markov Models                                28
The Model form for HMMs
• The model takes the following form:
                                                            m                   m
              p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j )
                                                            j =2               j =1



• Parameters in the model:
   – Initial probabilities π ( s ) for s ∈ {1, 2,..., k }

   – Transition probabilities t ( s ˚ s′ ) for s, s ' ∈ {1, 2,..., k }

   – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k }
         and x ∈ {1, 2,.., o}
 08/12/2010                                  Hidden Markov Models                                   29
6 components of HMMs
                                                                         start
• Discrete timesteps: 1, 2, ...
• Finite state space: {si}                                    π1              π2           π3
• Events {xi}                                                                               t31
                                               t11
                                                                   t12             t23
                                   π
• Vector of initial probabilities {πi}                   s1               s2                       s3
                                                                   t21               t32
  πi = p(q0 = si)
• Matrix of transition probabilities                               e13
                                                     e11                             e23           e33
                                                               e31
  T = {tij} = { p(qt+1=sj|qt=si) }                                        e22
• Matrix of emission probabilities                    x1                 x2                  x3
  E = {eij} = { p(ot=xj|qt=si) }


 The observations at continuous timesteps form an observation sequence
 {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo}

  08/12/2010                      Hidden Markov Models                                            30
6 components of HMMs
                                                                    start
• Given a specific HMM and an
  observation sequence, the                              π1              π2           π3
  corresponding sequence of states                                                     t31
                                          t11
  is generally not deterministic                              t12             t23
• Example:                                          s1        t21
                                                                     s2         t32
                                                                                              s3
  Given the observation sequence:                             e13
                                                e11                             e23           e33
  {x1, x3, x3, x2}                                        e31
                                                                     e22
  The corresponding states can be
  any of following sequences:
                                                 x1                 x2                  x3
  {s1, s1, s2, s2}
  {s1, s2, s3, s2}
  {s1, s1, s1, s2}
  ...
 08/12/2010                  Hidden Markov Models                                            31
Here’s an HMM
                                                                               0.2
                       0.5
                                              0.5                   0.6
                                  s1          0.4
                                                          s2         0.8
                                                                                     s3

                         0.3                  0.7                        0.9         0.8
                                        0.2               0.1

                              x1                         x2                    x3


             T    s1         s2        s3           E         x1    x2     x3              π   s1    s2    s3
             s1   0.5        0.5       0            s1        0.3   0      0.7                 0.3   0.3   0.4
             s2   0.4        0         0.6          s2        0     0.1    0.9
             s3   0.2        0.8       0            s3        0.2   0      0.8



08/12/2010                                      Hidden Markov Models                                             32
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 0.3 - 0.3 - 0.4
π      s1      s2         s3                                                   randomply choice
                                                                               between S1, S2, S3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1              o1
s2     0.4     0          0.6         s2    0      0.1        0.9         q2              o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                              33
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.2 - 0.8
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1
s2     0.4     0          0.6         s2    0      0.1        0.9         q2             o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3             o3

 08/12/2010                                       Hidden Markov Models                             34
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 Go to S2 with
π      s1      s2         s3                                                   probability 0.8 or
                                                                               S1 with prob. 0.2
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3      o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2              o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                                   35
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.3 - 0.7
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3             o3

 08/12/2010                                       Hidden Markov Models                                  36
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 Go to S2 with
π      s1      s2         s3                                                   probability 0.5 or
                                                                               S1 with prob. 0.5
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3      o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1      o2        X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                                   37
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.3 - 0.7
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2        X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3      S1     o3

 08/12/2010                                       Hidden Markov Models                                  38
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                               We got a sequence
                                                                                 of states and
π      s1      s2         s3                                                    corresponding
       0.3     0.3        0.4                                                   observations!
T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1    X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2    X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3      S1     o3    X3

 08/12/2010                                       Hidden Markov Models                              39
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)
• Most likely expaination (inference)
    – Given: Φ, the observation O = {o1, o2,..., ot}
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                          40
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          Calculating the probability of

• Most likely expaination (inference)                     observing the sequence O over
                                                          all of possible sequences.
    – Given: Φ, the observation O = {o1, o2,..., ot}
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                                41
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          Calculating the best

• Most likely expaination (inference)                     corresponding state sequence,
                                                          given an observation
    – Given: Φ, the observation O = {o1, o2,..., ot}
                                                          sequence.
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                             42
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
                                                          Given an (or a set of)
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          observation sequence and
• Most likely expaination (inference)                     corresponding state sequence,
    – Given: Φ, the observation O = {o1, o2,..., ot}      estimate the Transition matrix,

    – Goal: Q* = argmaxQ p(Q|O)                           Emission matrix and initial
                                                          probabilities of the HMM
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                                 43
Three famous HMM tasks
  Problem                             Algorithm           Complexity

  State estimation                    Forward-Backward    O(TN2)
  Calculating: p(O|Φ)

  Inference                           Viterbi decoding    O(TN2)
  Calculating: Q*= argmaxQp(Q|O)

  Learning                            Baum-Welch (EM)     O(TN2)
  Calculating: Φ* = argmaxΦp(O|Φ)


   T: number of timesteps
   N: number of states

08/12/2010                         Hidden Markov Models                44
The Forward-Backward Algorithm
• Given: Φ = (T, E, π), observation O = {o1, o2,..., ot}

• Goal: What is p(o1o2...ot)

• We can do this in a slow, stupid way
   – As shown in the next slide...




 08/12/2010              Hidden Markov Models         45
Here’s a HMM
0.5                                     0.2
                    0.5          0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2     0.8
                                              s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7           0.9                      • Slow, stupid way:
              0.2                             0.8
                           0.1
                                                                     p (O ) =          ∑              p ( OQ )
      x1                  x2            x3                                      Q∈paths of length 3

                                                                           =           ∑              p (O | Q ) p (Q )
                                                                                Q∈paths of length 3
                                                                                Q∈



                                                           • How to compute p(Q) for an
                                                             arbitrary path Q?
                                                           • How to compute p(O|Q) for an
                                                             arbitrary path Q?



      08/12/2010                              Hidden Markov Models                                               46
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(Q) = p(q1q2q3)                                              • How to compute p(Q) for an
 = p(q1)p(q2|q1)p(q3|q2,q1) (chain)                              arbitrary path Q?
 = p(q1)p(q2|q1)p(q3|q2) (why?)                                • How to compute p(O|Q) for an
                                                                 arbitrary path Q?
 Example in the case Q=S3S1S1
 P(Q) = 0.4 * 0.2 * 0.5 = 0.04
      08/12/2010                                  Hidden Markov Models                                               47
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(O|Q) = p(o1o2o3|q1q2q3)                                     • How to compute p(Q) for an
 = p(o1|q1)p(o2|q1)p(o3|q3) (why?)                               arbitrary path Q?
                                                               • How to compute p(O|Q) for an
 Example in the case Q=S3S1S1                                    arbitrary path Q?
 P(O|Q) = p(X3|S3)p(X1|S1) p(X3|S1)
 =0.8 * 0.3 * 0.7 = 0.168
      08/12/2010                                  Hidden Markov Models                                               48
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(O|Q) = p(o1o2o3|q1q2q3)                                     • How to compute p(Q) for an
              p(O) needs 27 p(Q)                                 arbitrary path Q?
 = p(o1|q1)p(o2|q1)p(o3|q3) (why?)
                     computations and 27
                                                               • How to compute p(O|Q) for an
                     p(O|Q) computations.
 Example in the case Q=S3S1S1                                    arbitrary path Q?
 P(O|Q) = p(X3|S3)p(Xsequence3has )
           What if the
                       1|S1) p(X |S1
                20 observations?
 =0.8 * 0.3 * 0.7 = 0.168                                    So let’s be smarter...
      08/12/2010                                  Hidden Markov Models                                               49
The Forward algorithm
• Given observation o1o2...oT

• Define:

  αt(i) = p(o1o2...ot ∧ qt = Si | Φ)               where 1 ≤ t ≤ T

  αt(i) = probability that, in a random trial:
   – We’d have seen the first t observations

   – We’d have ended up in Si as the t’th state visited.

• In our example, what is α2(3) ?

 08/12/2010                 Hidden Markov Models                     50

More Related Content

What's hot

Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsFrancesco Casalegno
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Random number generation
Random number generationRandom number generation
Random number generationvinay126me
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
A brief introduction to mutual information and its application
A brief introduction to mutual information and its applicationA brief introduction to mutual information and its application
A brief introduction to mutual information and its applicationHyun-hwan Jeong
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceSahil Kumar
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersMohammed Bennamoun
 

What's hot (20)

Viterbi algorithm
Viterbi algorithmViterbi algorithm
Viterbi algorithm
 
Markov Chains
Markov ChainsMarkov Chains
Markov Chains
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Hmm
HmmHmm
Hmm
 
Random number generation
Random number generationRandom number generation
Random number generation
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
A brief introduction to mutual information and its application
A brief introduction to mutual information and its applicationA brief introduction to mutual information and its application
A brief introduction to mutual information and its application
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Markov chain
Markov chainMarkov chain
Markov chain
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
mcmc
mcmcmcmc
mcmc
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Ch11 hmm
Ch11 hmmCh11 hmm
Ch11 hmm
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 

Similar to Hidden Markov Models

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
Markov Models
Markov ModelsMarkov Models
Markov ModelsVu Pham
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptxvikhramecesec
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFTtaha25
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flopKyawthu Koko
 
FiniteAutomata (1).ppt
FiniteAutomata (1).pptFiniteAutomata (1).ppt
FiniteAutomata (1).pptssuser47f7f2
 
FiniteAutomata.ppt
FiniteAutomata.pptFiniteAutomata.ppt
FiniteAutomata.pptRohitPaul71
 
Tele4653 l7
Tele4653 l7Tele4653 l7
Tele4653 l7Vin Voro
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Statistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesStatistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesInstitute of Validation Technology
 

Similar to Hidden Markov Models (17)

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Markov Models
Markov ModelsMarkov Models
Markov Models
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
hmm.ppt
hmm.ppthmm.ppt
hmm.ppt
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptx
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flop
 
FiniteAutomata (1).ppt
FiniteAutomata (1).pptFiniteAutomata (1).ppt
FiniteAutomata (1).ppt
 
FiniteAutomata.ppt
FiniteAutomata.pptFiniteAutomata.ppt
FiniteAutomata.ppt
 
Finite automata
Finite automataFinite automata
Finite automata
 
Tele4653 l7
Tele4653 l7Tele4653 l7
Tele4653 l7
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Statistical controls for qc
Statistical controls for qcStatistical controls for qc
Statistical controls for qc
 
Statistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesStatistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation Studies
 

Recently uploaded

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Recently uploaded (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Hidden Markov Models

  • 1. MACHINE LEARNING Hidden Markov Models VU H. Pham phvu@fit.hcmus.edu.vn Department of Computer Science Dececmber 6th, 2010 08/12/2010 Hidden Markov Models 1
  • 2. Contents • Introduction • Markov Chain • Hidden Markov Models 08/12/2010 Hidden Markov Models 2
  • 3. Introduction • Markov processes are first proposed by Russian mathematician Andrei Markov – He used these processes to investigate Pushkin’s poem. • Nowaday, Markov property and HMMs are widely used in many domains: – Natural Language Processing – Speech Recognition – Bioinformatics – Image/video processing – ... 08/12/2010 Hidden Markov Models 3
  • 4. Markov Chain • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } Current state N=3 t=0 q t = q 0 = s3 08/12/2010 Hidden Markov Models 4
  • 5. Markov Chain • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in Current state exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } • Between each timestep, the next state is chosen randomly. N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 5
  • 6. p ( s1 ˚ s2 ) = 1 2 Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0 • Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 • The current state determines the probability for the next state. N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 6
  • 7. p ( s1 ˚ s2 ) = 1 2 Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • Has N states, called s1, s2, ..., sN 1/2 • There are discrete timesteps, t=0, s2 1/2 t=1,... s1 2/3 • On the t’th timestep the system is in 1/3 1 exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0 • Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 • The current state determines the probability for the next state. N=3 – Often notated with arcs between states t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 7
  • 8. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 8
  • 9. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 9
  • 10. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 • How to represent the joint p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 10
  • 11. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 q0p ( s 3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: q1 s1 1/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 q2 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t • How to represent the joint q3 p ( s 2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 11
  • 12. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 08/12/2010 Hidden Markov Models 12
  • 13. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) 08/12/2010 Hidden Markov Models 13
  • 14. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) • The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 08/12/2010 s3 Hidden Markov Models Transition probabilities 14
  • 15. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) • The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 08/12/2010 s3 Hidden Markov Models Transition probabilities 15
  • 16. Markov Chain – Important property • In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 08/12/2010 Hidden Markov Models 16
  • 17. Markov Chain – Important property • In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 • Why? m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states ) j =1 m = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 Due to the Markov property 08/12/2010 Hidden Markov Models 17
  • 18. Markov Chain: e.g. • The state-space of weather: rain wind cloud 08/12/2010 Hidden Markov Models 18
  • 19. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 08/12/2010 Hidden Markov Models 19
  • 20. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. 08/12/2010 Hidden Markov Models 20
  • 21. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. • We have observed the weather in a week: rain wind rain rain cloud Day: 0 1 2 3 4 08/12/2010 Hidden Markov Models 21
  • 22. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. • We have observed the weather in a week: Markov Chain rain wind rain rain cloud Day: 0 1 2 3 4 08/12/2010 Hidden Markov Models 22
  • 23. Contents • Introduction • Markov Chain • Hidden Markov Models 08/12/2010 Hidden Markov Models 23
  • 24. Modeling pairs of sequences • In many applications, we have to model pair of sequences • Examples: – POS tagging in Natural Language Processing (assign each word in a sentence to Noun, Adj, Verb...) – Speech recognition (map acoustic sequences to sequences of words) – Computational biology (recover gene boundaries in DNA sequences) – Video tracking (estimate the underlying model states from the observation sequences) – And many others... 08/12/2010 Hidden Markov Models 24
  • 25. Probabilistic models for sequence pairs • We have two sequences of random variables: X1, X2, ..., Xm and S1, S2, ..., Sm • Intuitively, in a pratical system, each Xi corresponds to an observation and each Si corresponds to a state that generated the observation. • Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o} • How do we model the joint distribution: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) 08/12/2010 Hidden Markov Models 25
  • 26. Hidden Markov Models (HMMs) • In HMMs, we assume that p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm ) m m = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j ) j =2 j =1 • This is often called Independence assumptions in HMMs • We are gonna prove it in the next slides 08/12/2010 Hidden Markov Models 26
  • 27. Independence Assumptions in HMMs [1] p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C ) • By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ,..., S m = sm ) × p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm ) • Assumption 1: the state sequence forms a Markov chain m p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) j =2 08/12/2010 Hidden Markov Models 27
  • 28. Independence Assumptions in HMMs [2] • By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm ) m = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) j =1 • Assumption 2: each observation depends only on the underlying state p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) = p( X j = xj ˚ S j = sj ) • These two assumptions are often called independence assumptions in HMMs 08/12/2010 Hidden Markov Models 28
  • 29. The Model form for HMMs • The model takes the following form: m m p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j ) j =2 j =1 • Parameters in the model: – Initial probabilities π ( s ) for s ∈ {1, 2,..., k } – Transition probabilities t ( s ˚ s′ ) for s, s ' ∈ {1, 2,..., k } – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k } and x ∈ {1, 2,.., o} 08/12/2010 Hidden Markov Models 29
  • 30. 6 components of HMMs start • Discrete timesteps: 1, 2, ... • Finite state space: {si} π1 π2 π3 • Events {xi} t31 t11 t12 t23 π • Vector of initial probabilities {πi} s1 s2 s3 t21 t32 πi = p(q0 = si) • Matrix of transition probabilities e13 e11 e23 e33 e31 T = {tij} = { p(qt+1=sj|qt=si) } e22 • Matrix of emission probabilities x1 x2 x3 E = {eij} = { p(ot=xj|qt=si) } The observations at continuous timesteps form an observation sequence {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo} 08/12/2010 Hidden Markov Models 30
  • 31. 6 components of HMMs start • Given a specific HMM and an observation sequence, the π1 π2 π3 corresponding sequence of states t31 t11 is generally not deterministic t12 t23 • Example: s1 t21 s2 t32 s3 Given the observation sequence: e13 e11 e23 e33 {x1, x3, x3, x2} e31 e22 The corresponding states can be any of following sequences: x1 x2 x3 {s1, s1, s2, s2} {s1, s2, s3, s2} {s1, s1, s1, s2} ... 08/12/2010 Hidden Markov Models 31
  • 32. Here’s an HMM 0.2 0.5 0.5 0.6 s1 0.4 s2 0.8 s3 0.3 0.7 0.9 0.8 0.2 0.1 x1 x2 x3 T s1 s2 s3 E x1 x2 x3 π s1 s2 s3 s1 0.5 0.5 0 s1 0.3 0 0.7 0.3 0.3 0.4 s2 0.4 0 0.6 s2 0 0.1 0.9 s3 0.2 0.8 0 s3 0.2 0 0.8 08/12/2010 Hidden Markov Models 32
  • 33. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.3 - 0.4 π s1 s2 s3 randomply choice between S1, S2, S3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 o1 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 33
  • 34. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.2 - 0.8 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 34
  • 35. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 Go to S2 with π s1 s2 s3 probability 0.8 or S1 with prob. 0.2 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 35
  • 36. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.7 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 36
  • 37. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 Go to S2 with π s1 s2 s3 probability 0.5 or S1 with prob. 0.5 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 37
  • 38. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.7 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 S1 o3 08/12/2010 Hidden Markov Models 38
  • 39. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 We got a sequence of states and π s1 s2 s3 corresponding 0.3 0.3 0.4 observations! T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 S1 o3 X3 08/12/2010 Hidden Markov Models 39
  • 40. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) • Most likely expaination (inference) – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 40
  • 41. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the probability of • Most likely expaination (inference) observing the sequence O over all of possible sequences. – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 41
  • 42. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the best • Most likely expaination (inference) corresponding state sequence, given an observation – Given: Φ, the observation O = {o1, o2,..., ot} sequence. – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 42
  • 43. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} Given an (or a set of) – Goal: p(O|Φ), or equivalently p(st = Si|O) observation sequence and • Most likely expaination (inference) corresponding state sequence, – Given: Φ, the observation O = {o1, o2,..., ot} estimate the Transition matrix, – Goal: Q* = argmaxQ p(Q|O) Emission matrix and initial probabilities of the HMM • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 43
  • 44. Three famous HMM tasks Problem Algorithm Complexity State estimation Forward-Backward O(TN2) Calculating: p(O|Φ) Inference Viterbi decoding O(TN2) Calculating: Q*= argmaxQp(Q|O) Learning Baum-Welch (EM) O(TN2) Calculating: Φ* = argmaxΦp(O|Φ) T: number of timesteps N: number of states 08/12/2010 Hidden Markov Models 44
  • 45. The Forward-Backward Algorithm • Given: Φ = (T, E, π), observation O = {o1, o2,..., ot} • Goal: What is p(o1o2...ot) • We can do this in a slow, stupid way – As shown in the next slide... 08/12/2010 Hidden Markov Models 45
  • 46. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ • How to compute p(Q) for an arbitrary path Q? • How to compute p(O|Q) for an arbitrary path Q? 08/12/2010 Hidden Markov Models 46
  • 47. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(Q) = p(q1q2q3) • How to compute p(Q) for an = p(q1)p(q2|q1)p(q3|q2,q1) (chain) arbitrary path Q? = p(q1)p(q2|q1)p(q3|q2) (why?) • How to compute p(O|Q) for an arbitrary path Q? Example in the case Q=S3S1S1 P(Q) = 0.4 * 0.2 * 0.5 = 0.04 08/12/2010 Hidden Markov Models 47
  • 48. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(O|Q) = p(o1o2o3|q1q2q3) • How to compute p(Q) for an = p(o1|q1)p(o2|q1)p(o3|q3) (why?) arbitrary path Q? • How to compute p(O|Q) for an Example in the case Q=S3S1S1 arbitrary path Q? P(O|Q) = p(X3|S3)p(X1|S1) p(X3|S1) =0.8 * 0.3 * 0.7 = 0.168 08/12/2010 Hidden Markov Models 48
  • 49. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(O|Q) = p(o1o2o3|q1q2q3) • How to compute p(Q) for an p(O) needs 27 p(Q) arbitrary path Q? = p(o1|q1)p(o2|q1)p(o3|q3) (why?) computations and 27 • How to compute p(O|Q) for an p(O|Q) computations. Example in the case Q=S3S1S1 arbitrary path Q? P(O|Q) = p(X3|S3)p(Xsequence3has ) What if the 1|S1) p(X |S1 20 observations? =0.8 * 0.3 * 0.7 = 0.168 So let’s be smarter... 08/12/2010 Hidden Markov Models 49
  • 50. The Forward algorithm • Given observation o1o2...oT • Define: αt(i) = p(o1o2...ot ∧ qt = Si | Φ) where 1 ≤ t ≤ T αt(i) = probability that, in a random trial: – We’d have seen the first t observations – We’d have ended up in Si as the t’th state visited. • In our example, what is α2(3) ? 08/12/2010 Hidden Markov Models 50