SlideShare a Scribd company logo
1 of 33
Download to read offline
Hidden Markov Modelling
• Introduction
• Problem formulation
• Forward-Backward algorithm
• Viterbi search
• Baum-Welch parameter estimation
• Other considerations
– Multiple observation sequences
– Phone-based models for continuous speech recognition
– Continuous density HMMs
– Implementation issues
6.345 Automatic Speech Recognition HMM 1
Lecture # 10
Session 2003
Information Theoretic Approach to ASR
Speech Generation

Text
Generation
Speech Recognition

noisy
communication
channel
Linguistic
Decoder
Speech
����������
Acoustic
Processor
A W
Recognition is achieved by maximizing the probability of the
linguistic string, W, given the acoustic evidence, A, i.e., choose the
ˆ
linguistic sequence W such that
P(Ŵ|A) = m
W
ax P(W|A)
6.345 Automatic Speech Recognition HMM 2
Information Theoretic Approach to ASR
• From Bayes rule:
P(W|A) =
P(A|W)P(W)
P(A)
• Hidden Markov modelling (HMM) deals with the quantity P(A|W)
• Change in notation:
A → O
W → λ
P(A|W) → P(O|λ)
6.345 Automatic Speech Recognition HMM 3
HMM: An Example
2
2
2
1
1
1
1
2
2
1
2
0
2
2
2
1
1
1
1
2
2
1
2
1
2
2
2
1
1
1
1
2
2
1
2
2
1 2
1
2
• Consider 3 mugs, each with mixtures of state stones, 1 and 2
• The fractions for the ith mug are ai1 and ai2, and ai1 + ai2 = 1
• Consider 2 urns, each with mixtures of black and white balls
• The fractions for the ith urn are bi(B) and bi(W); bi(B) + bi(W) = 1
•	 The parameter vector for this model is:
λ = {a01, a02, a11, a12, a21, a22, b1(B), b1(W), b2(B), b2(W)}
6.345 Automatic Speech Recognition HMM 4
HMM: An Example (cont’d)
2
2
2
1
1
1
1
2
2
1
2
0
2
2
2
1
1
1
1
2
2
1
2
1
2
2
2
1
1
1
1
2
2
1
2
1
2
1 1
1 1
1 2 2 1
Observation Sequence: O = {B, W, B, W, W, B}
State Sequence: Q = {1, 1, 2, 1, 2, 1}
Goal: Given the model λ and the observation sequence O,
how can the underlying state sequence Q be determined?
6.345 Automatic Speech Recognition HMM 5
2
2
2
1
1
1
1
2
2
1
2
2
2
2
2
1
1
1
1
2
2
1
2
1
1 2
2
2
2
1
1
1
1
2
2
1
2
2
1
Elements of a Discrete Hidden Markov Model

• N: number of states in the model
– states, s = {s1, s2, . . . , sN}
– state at time t, qt ∈ s
•	 M: number of observation symbols (i.e., discrete observations)
– observation symbols, v = {v1, v2, . . . , vM}
– observation at time t, ot ∈ v
• A = {aij}: state transition probability distribution
– aij = P(qt+1 = sj|qt = si), 1 ≤ i, j ≤ N
• B = {bj(k)}: observation symbol probability distribution in state j
– bj(k) = P(vk at t|qt = sj), 1 ≤ j ≤ N, 1 ≤ k ≤ M
• π = {πi}: initial state distribution
– πi = P(q1 = si), 1 ≤ i ≤ N
Notationally, an HMM is typically written as: λ = {A, B, π}
6.345 Automatic Speech Recognition HMM 6
� � � �
HMM: An Example (cont’d)
For our simple example:
π = {a01, a02}, A =
a11 a12
, and B =
b1(B) b1(W)
a21 a22 b2(B) b2(W)
State Diagram
2-state 3-state
1 2
a11 a12
a22
1 3
2
a21
{b1(B), b1(W)} {b2(B), b2(W)}
6.345 Automatic Speech Recognition HMM 7
Generation of HMM Observations
1.	 Choose an initial state, q1 = si, based on the initial state
distribution, π
2. For t = 1 to T:
•	 Choose ot = vk according to the symbol probability distribution
in state si, bi(k)
•	 Transition to a new state qt+1 = sj according to the state
transition probability distribution for state si, aij
3. Increment t by 1, return to step 2 if t ≤ T; else, terminate
a0i aij
q1 q2
. . . . .
bi(k)
o1 o2
qT
oT
6.345 Automatic Speech Recognition HMM 8
Representing State Diagram by Trellis
1 2 3
s1
s2
s3
0 1 2 3 4
The dashed line represents a null transition, where no observation
symbol is generated
6.345 Automatic Speech Recognition HMM 9
Three Basic HMM Problems
1.	 Scoring: Given an observation sequence O�= {o1, o2, ..., oT } and a
model λ = {A, B, π}, how do we compute P(O�| λ), the probability
of the observation sequence?
==> The Forward-Backward Algorithm
2.	 Matching: Given an observation sequence O�= {o1, o2, ..., oT }, how
do we choose a state sequence Q�= {q1, q2, ..., qT } which is
optimum in some sense?
==> The Viterbi Algorithm
3. Training: How do we adjust the model parameters λ = {A, B, π} to
maximize P(O�| λ)?
==> The Baum-Welch Re-estimation Procedures
6.345 Automatic Speech Recognition HMM 10
�
�
Computation of P(O|λ)
P(O|λ) = P(O, Q�
|λ)
allQ�
P(O, Q�
|λ) = P(O|Q�
, λ)P(Q�
|λ)

• Consider the fixed�state sequence: Q�= q1q2 . . . qT
P(O|Q�
, λ) = bq1 (o1)bq2 (o2) . . . bqT (oT )
P(Q�
|λ) = πq1 aq1q2 aq2q3 . . . aqT−1qT
Therefore:
P(O|λ) = 

q1,q2,...,qT

πq1 bq1 (o1)aq1q2 bq2 (o2) . . . aqT−1qT bqT (oT )

• Calculation required ≈ 2T · NT (there are NT such sequences)
For N = 5, T = 100 ⇒ 2 · 100 · 5100 ≈ 1072 computations!
6.345 Automatic Speech Recognition HMM 11
�
�
The Forward Algorithm
•	 Let us define the forward variable, αt(i), as the probability of the
partial observation sequence up to time t and�state si at time t,
given the model, i.e.
αt(i) = P(o1o2 . . . ot, qt = si|λ)
• It can easily be shown that:
α1(i) = πibi(o1), 1 ≤ i ≤ N
N
P(O|λ) = αT (i)
• By induction: i=1
N
1 ≤ t ≤ T − 1
αt+1 (j) = [ αt (i) aij]bj (ot+1),
1 ≤ j ≤ N
i=1
•	 Calculation is on the order of N2T.
For N = 5, T = 100 ⇒ 100 · 52 computations, instead of 1072
6.345 Automatic Speech Recognition HMM 12
Forward Algorithm Illustration
s1
si
sj
sN
1 t t+1 T
a1j
aNj
ajj
aij
αt(i)
αt+1(j)
6.345 Automatic Speech Recognition HMM 13
The Backward Algorithm
•	 Similarly, let us define the backward variable, βt(i), as the
probability of the partial observation sequence from time t + 1 to
the end, given state si at time t and the model, i.e.
βt(i) = P(ot+1ot+2 . . . oT |qt = si, λ)
• It can easily be shown that:
and:
• By induction:
N
�
βt (i) =
j=1
βT (i) = 1, 1 ≤ i ≤ N
N
�
P(O|λ) = πibi(o1)β1(i)
i=1
t = T − 1, T − 2, . . . , 1
aijbj (ot+1) βt+1 (j),
1 ≤ i ≤ N
6.345 Automatic Speech Recognition HMM 14
Backward Procedure Illustration
s1
si
sj
sN
1 t t+1 T
ai1
aiN
aij
aii
βt(i)
βt+1(j)
6.345 Automatic Speech Recognition HMM 15
N
�
Finding Optimal State Sequences
• One criterion chooses states, qt, which are individually most likely
– This maximizes the expected number of correct states
•	 Let us define γt(i)�as the probability of being in state si at time t,
given the observation sequence and the model, i.e.
γt(i) = �
P(qt =�si|O, λ)
i=1
γt(i) = 1, ∀t

•	 Then the individually most likely state, qt, at time t is:
qt =�argmax�γt(i) 1�≤ t ≤ T

1≤i≤N

• Note that it can be shown that:
γt(i) = �
αt(i)βt(i)�
P(O|λ)�
6.345 Automatic Speech Recognition HMM 16
i
Finding Optimal State Sequences
•	 The individual optimality criterion has the problem that the
optimum state sequence may not obey state transition constraints
•	 Another optimality criterion is to choose the state sequence which
maximizes P(Q, O|λ); This can be found by the Viterbi algorithm
•	 Let us define δt(i)�as the highest probability along a single path, at
time t, which accounts for the first t observations, i.e.
δt(i) = � max� P(q1q2�. . . qt−1, qt =�si, o1o2�. . . ot|λ)
q1,q2,...,qt−1�
• By induction: δt+1(j) = [max�δt(i)aij]bj(ot+1)�
•	 To retrieve the state sequence, we must keep track of the state
sequence which gave the best path, at time t, to state si
– We do this in a separate array ψt(i)�
6.345 Automatic Speech Recognition HMM 17
T
The Viterbi Algorithm
1. Initialization:
δ1(i) = �
πibi(o1), 1�≤ i ≤ N
ψ1(i) = 0�
2. Recursion:
δt(j) = �
max�
1≤i≤N
[δt−1(i)aij]bj(ot), 2�≤ t ≤ T 1�≤ j ≤ N
ψt(j)� =�argmax[δt−1(i)aij], 2�≤ t ≤ T 1�≤ j ≤ N
1≤i≤N
3. Termination:
P∗ = max�
1≤i≤N
[δT (i)]�
q∗
=�argmax[δT (i)]
1≤i≤N

4.	 Path (state-sequence) backtracking:
qt
∗
=�ψt+1(qt
∗
+1), t =�T − 1, T − 2, . . . , 1�
Computation ≈ N2T
6.345 Automatic Speech Recognition HMM 18
The Viterbi Algorithm: An Example
1
s1
s3
s2
4
3
2
0
0 b
b
a
a
1 2 3
.8
.2
.5
.5
.7
.3
.3
.7
P(a)
P(b)
.5 .4
.1
.5
.2
.3
O={a a b b}
.40
.09
.35
.35
.40
.15
.10
.10
.15
.21
.21
.20
.20
.20
.20
.20
.20
.20
.20
.20
.10
.10
.10
.10
.10
.09
6.345 Automatic Speech Recognition HMM 19
The Viterbi Algorithm: An Example (cont’d)
0 a aa aab aabb
s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016
s1, 0 .08 s1, 0 .032 s1, 0 .0032 s1, 0 .00032
s2 s1, 0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144
s2, a .04 s2, a .042 s2, b .0168 s2, b .00336
s2, 0 .021 s2, 0 .0084 s2, 0 .00168 s2, 0 .000336
s3 s2, 0 .02
s2, a .03 s2, a .0315 s2, b .0294 s2, b .00588
0 a a b b
1.0 0.4 0.16 0.016 0.0016
s1
s2
s3
0 1 2 3 4
0.00588
0.0294
0.0315
0.03
0.02
0.00336
0.0168
0.084
0.21
0.2
6.345 Automatic Speech Recognition HMM 20
Matching Using Forward-Backward Algorithm
0 a aa aab aabb
s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016
s1, 0 .08 s1, 0 .032 s1, 0 .0032 s1, 0 .00032
s2 s1, 0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144
s2, a .04 s2, a .066 s2, b .0364 s2, b .0108
s2, 0 .033 s2, 0 .0182 s2, 0 .0054 s2, 0 .001256
s3 s2, 0 .02
s2, a .03 s2, a .0495 s2, b .0637 s2, b .0189
s1
s2
s3
0 a a b b
1.0 0.4 0.16 0.016 0.0016
0.01256
0.0540
0.182
0.33
0.2
0.020156
0.0691
0.0677
0.063
0.02
0 1 2 3 4
6.345 Automatic Speech Recognition HMM 21
�
�
�
Baum-Welch Re-estimation
• Baum-Welch re-estimation uses EM to determine ML parameters
•	 Define ξt(i, j) as the probability of being in state si at time t and
state sj at time t + 1, given the model and observation sequence
ξt(i, j) = P(qt = si, qt+1 = sj|O, λ)
• Then:
ξt(i, j) =
αt(i)aijbj(ot+1)βt+1(j)
P(O|λ)
N
γt(i) = ξt(i, j)
j=1
• Summing γt(i) and ξt(i, j), we get:
T−1
γt(i) = expected number of transitions from si
t=1
T−1
ξt(i, j) = expected number of transitions from si to sj
t=1
6.345 Automatic Speech Recognition HMM 22
Baum-Welch Re-estimation Procedures
s1
si
sj
sN
1 t-1 t t+1 t+2 T
αt(i)
βt+1(j)
aij
6.345 Automatic Speech Recognition HMM 23
¯
¯
¯
�
�
T
�
T
�
Baum-Welch Re-estimation Formulas
π =�expected number of times in state si at t = 1 �
=�γ1(i)�
aij =�
expected number of transitions from state si to sj
expected number of transitions from state si
T−1�
ξt(i, j)�
t=1
=�
T−1�
γt(i)�
t=1�
bj(k) = �
expected number of times in state sj with symbol vk
expected number of times in state sj
γt(j)�
t=1�
ot=vk
=�
γt(j)�
t=1�
6.345 Automatic Speech Recognition HMM 24
¯
¯
¯
¯
¯
Baum-Welch Re-estimation Formulas
A, ¯ π)�is the
•	 If λ = (A, B, π)�is the initial model, and λ = (¯ B, ¯
re-estimated model. Then it can be proved that either:
1.	 The initial model, λ, defines a critical point of the likelihood
function, in which case λ =�λ, or
2. Model λ is more likely than λ in the sense that P(O|λ̄)�> P(O|λ),
i.e., we have found a new model λ from which the observation
sequence is more likely to have been produced.
•	 Thus we can improve the probability of O being observed from
the model if we iteratively use λ in place of λ and repeat the
re-estimation until some limiting point is reached. The resulting
model is called the maximum likelihood HMM.
6.345 Automatic Speech Recognition HMM 25
K K
� �
Multiple Observation Sequences
•	 Speech recognition typically uses left-to-right HMMs. These HMMs
can not be trained using a single observation sequence, because
only a small number of observations are available to train each
state. To obtain reliable estimates of model parameters, one must
use multiple observation sequences. In this case, the
re-estimation procedure needs to be modified.
• Let us denote the set of K observation sequences as
O =�{O(1)
, O(2)
, . . . , O(K)
}
1 2� Tk
} is the k-th observation sequence.
where O(k) =�{o
(k)
, o
(k)
, . . . , o
(k)�
•	 Assume that the observations sequences are mutually
independent, we want to estimate the parameters so as to
maximize
P(O | λ) = � P(O(k)�
| λ) = � Pk
k=1� k=1�
6.345 Automatic Speech Recognition HMM 26
¯
¯
� �
� �
� �
� �
� �
� �
� Tk
�
Multiple Observation Sequences (cont’d)
•	 Since the re-estimation formulas are based on frequency of
occurrence of various events, we can modify them by adding up
the individual frequencies of occurrence for each sequence
K Tk−1� K
1�Tk−1�
ξt
k
(i, j)�
k=1�
Pk t=1�
αt
k
(i)aijbj(o
(k)�
t+1)βk
t+1(j)�
k=1�t=1�
=
aij =�
K Tk−1� K
1�Tk−1�
γt
k
(i)� αt
k
(i)βk
t (i)�
k=1�t=1� k=1�
Pk t=1�
K Tk K
1� �
γt
k
(j)�
k=1�
Pk t=1�
αt
k
(i)βk
t (i)�
k=1� t=1�
ot
(k)
=v� ot
(k)
=v�
bj(�) = �
K Tk
=�
K Tk
γt
k
(j)�
k=1�
Pk t=1�
1��
αt
k
(i)βk
t (i)�
k=1t=1�
6.345 Automatic Speech Recognition HMM 27
Phone-based HMMs
•	 Word-based HMMs are appropriate for small vocabulary speech
recognition. For large vocabulary ASR, sub-word-based (e.g.,
phone-based) models are more appropriate.
6.345 Automatic Speech Recognition HMM 28
Phone-based HMMs (cont’d)
•	 The phone models can have many states, and words are made up
from a concatenation of phone models.
6.345 Automatic Speech Recognition HMM 29
�
�
Continuous Density Hidden Markov Models
•	 A continuous density HMM replaces the discrete observation
probabilities, bj(k), by a continuous PDF bj(x)
• A common practice is to represent bj(x) as a mixture of Gaussians:
M
bj(x) = cjkN[x, µjk, Σjk] 1 ≤ j ≤ N
k=1
where cjk is the mixture weight
M
cjk ≥ 0 (1 ≤ j ≤ N, 1 ≤ k ≤ M, and cjk = 1, 1 ≤ j ≤ N),
k=1
N is the normal density, and

µjk and Σjk are the mean vector and covariance matrix

associated with state j and mixture k.

6.345 Automatic Speech Recognition HMM 30
Acoustic Modelling Variations
• Semi-continuous HMMs first compute a VQ codebook of size M
– The VQ codebook is then modelled as a family of Gaussian PDFs
–	 Each codeword is represented by a Gaussian PDF, and may be
used together with others to model the acoustic vectors
–	 From the CD-HMM viewpoint, this is equivalent to using the
same set of M mixtures to model all the states
– It is therefore often referred to as a Tied Mixture HMM
•	 All three methods have been used in many speech recognition
tasks, with varying outcomes
•	 For large-vocabulary, continuous speech recognition with
sufficient amount (i.e., tens of hours) of training data, CD-HMM
systems currently yield the best performance, but with
considerable increase in computation
6.345 Automatic Speech Recognition HMM 31
Implementation Issues
• Scaling: to prevent underflow
•	 Segmental K-means Training: to train observation probabilities by
first performing Viterbi alignment
• Initial estimates of λ: to provide robust models
• Pruning: to reduce search computation
6.345 Automatic Speech Recognition HMM 32
References
•	 X. Huang, A. Acero, and H. Hon, Spoken Language Processing,
Prentice-Hall, 2001.
•	 F. Jelinek, Statistical Methods for Speech Recognition. MIT Press,
1997.
•	 L. Rabiner and B. Juang, Fundamentals of Speech Recognition,
Prentice-Hall, 1993.
6.345 Automatic Speech Recognition HMM 33

More Related Content

Similar to continious hmm.pdf

Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsColin Gillespie
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920Karl Rudeen
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmmnozomuhamada
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Pei-Che Chang
 
Frequency Response with MATLAB Examples.pdf
Frequency Response with MATLAB Examples.pdfFrequency Response with MATLAB Examples.pdf
Frequency Response with MATLAB Examples.pdfSunil Manjani
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)MeetupDataScienceRoma
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxKIRUTHIKAAR2
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesDavid Yonge-Mallo
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 

Similar to continious hmm.pdf (20)

Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmm
 
discrete-hmm
discrete-hmmdiscrete-hmm
discrete-hmm
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
Frequency Response with MATLAB Examples.pdf
Frequency Response with MATLAB Examples.pdfFrequency Response with MATLAB Examples.pdf
Frequency Response with MATLAB Examples.pdf
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic Notation
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptx
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming Languages
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 

Recently uploaded

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

continious hmm.pdf

  • 1. Hidden Markov Modelling • Introduction • Problem formulation • Forward-Backward algorithm • Viterbi search • Baum-Welch parameter estimation • Other considerations – Multiple observation sequences – Phone-based models for continuous speech recognition – Continuous density HMMs – Implementation issues 6.345 Automatic Speech Recognition HMM 1 Lecture # 10 Session 2003
  • 2. Information Theoretic Approach to ASR Speech Generation Text Generation Speech Recognition noisy communication channel Linguistic Decoder Speech ���������� Acoustic Processor A W Recognition is achieved by maximizing the probability of the linguistic string, W, given the acoustic evidence, A, i.e., choose the ˆ linguistic sequence W such that P(Ŵ|A) = m W ax P(W|A) 6.345 Automatic Speech Recognition HMM 2
  • 3. Information Theoretic Approach to ASR • From Bayes rule: P(W|A) = P(A|W)P(W) P(A) • Hidden Markov modelling (HMM) deals with the quantity P(A|W) • Change in notation: A → O W → λ P(A|W) → P(O|λ) 6.345 Automatic Speech Recognition HMM 3
  • 4. HMM: An Example 2 2 2 1 1 1 1 2 2 1 2 0 2 2 2 1 1 1 1 2 2 1 2 1 2 2 2 1 1 1 1 2 2 1 2 2 1 2 1 2 • Consider 3 mugs, each with mixtures of state stones, 1 and 2 • The fractions for the ith mug are ai1 and ai2, and ai1 + ai2 = 1 • Consider 2 urns, each with mixtures of black and white balls • The fractions for the ith urn are bi(B) and bi(W); bi(B) + bi(W) = 1 • The parameter vector for this model is: λ = {a01, a02, a11, a12, a21, a22, b1(B), b1(W), b2(B), b2(W)} 6.345 Automatic Speech Recognition HMM 4
  • 5. HMM: An Example (cont’d) 2 2 2 1 1 1 1 2 2 1 2 0 2 2 2 1 1 1 1 2 2 1 2 1 2 2 2 1 1 1 1 2 2 1 2 1 2 1 1 1 1 1 2 2 1 Observation Sequence: O = {B, W, B, W, W, B} State Sequence: Q = {1, 1, 2, 1, 2, 1} Goal: Given the model λ and the observation sequence O, how can the underlying state sequence Q be determined? 6.345 Automatic Speech Recognition HMM 5 2 2 2 1 1 1 1 2 2 1 2 2 2 2 2 1 1 1 1 2 2 1 2 1 1 2 2 2 2 1 1 1 1 2 2 1 2 2 1
  • 6. Elements of a Discrete Hidden Markov Model • N: number of states in the model – states, s = {s1, s2, . . . , sN} – state at time t, qt ∈ s • M: number of observation symbols (i.e., discrete observations) – observation symbols, v = {v1, v2, . . . , vM} – observation at time t, ot ∈ v • A = {aij}: state transition probability distribution – aij = P(qt+1 = sj|qt = si), 1 ≤ i, j ≤ N • B = {bj(k)}: observation symbol probability distribution in state j – bj(k) = P(vk at t|qt = sj), 1 ≤ j ≤ N, 1 ≤ k ≤ M • π = {πi}: initial state distribution – πi = P(q1 = si), 1 ≤ i ≤ N Notationally, an HMM is typically written as: λ = {A, B, π} 6.345 Automatic Speech Recognition HMM 6
  • 7. � � � � HMM: An Example (cont’d) For our simple example: π = {a01, a02}, A = a11 a12 , and B = b1(B) b1(W) a21 a22 b2(B) b2(W) State Diagram 2-state 3-state 1 2 a11 a12 a22 1 3 2 a21 {b1(B), b1(W)} {b2(B), b2(W)} 6.345 Automatic Speech Recognition HMM 7
  • 8. Generation of HMM Observations 1. Choose an initial state, q1 = si, based on the initial state distribution, π 2. For t = 1 to T: • Choose ot = vk according to the symbol probability distribution in state si, bi(k) • Transition to a new state qt+1 = sj according to the state transition probability distribution for state si, aij 3. Increment t by 1, return to step 2 if t ≤ T; else, terminate a0i aij q1 q2 . . . . . bi(k) o1 o2 qT oT 6.345 Automatic Speech Recognition HMM 8
  • 9. Representing State Diagram by Trellis 1 2 3 s1 s2 s3 0 1 2 3 4 The dashed line represents a null transition, where no observation symbol is generated 6.345 Automatic Speech Recognition HMM 9
  • 10. Three Basic HMM Problems 1. Scoring: Given an observation sequence O�= {o1, o2, ..., oT } and a model λ = {A, B, π}, how do we compute P(O�| λ), the probability of the observation sequence? ==> The Forward-Backward Algorithm 2. Matching: Given an observation sequence O�= {o1, o2, ..., oT }, how do we choose a state sequence Q�= {q1, q2, ..., qT } which is optimum in some sense? ==> The Viterbi Algorithm 3. Training: How do we adjust the model parameters λ = {A, B, π} to maximize P(O�| λ)? ==> The Baum-Welch Re-estimation Procedures 6.345 Automatic Speech Recognition HMM 10
  • 11. � � Computation of P(O|λ) P(O|λ) = P(O, Q� |λ) allQ� P(O, Q� |λ) = P(O|Q� , λ)P(Q� |λ) • Consider the fixed�state sequence: Q�= q1q2 . . . qT P(O|Q� , λ) = bq1 (o1)bq2 (o2) . . . bqT (oT ) P(Q� |λ) = πq1 aq1q2 aq2q3 . . . aqT−1qT Therefore: P(O|λ) = q1,q2,...,qT πq1 bq1 (o1)aq1q2 bq2 (o2) . . . aqT−1qT bqT (oT ) • Calculation required ≈ 2T · NT (there are NT such sequences) For N = 5, T = 100 ⇒ 2 · 100 · 5100 ≈ 1072 computations! 6.345 Automatic Speech Recognition HMM 11
  • 12. � � The Forward Algorithm • Let us define the forward variable, αt(i), as the probability of the partial observation sequence up to time t and�state si at time t, given the model, i.e. αt(i) = P(o1o2 . . . ot, qt = si|λ) • It can easily be shown that: α1(i) = πibi(o1), 1 ≤ i ≤ N N P(O|λ) = αT (i) • By induction: i=1 N 1 ≤ t ≤ T − 1 αt+1 (j) = [ αt (i) aij]bj (ot+1), 1 ≤ j ≤ N i=1 • Calculation is on the order of N2T. For N = 5, T = 100 ⇒ 100 · 52 computations, instead of 1072 6.345 Automatic Speech Recognition HMM 12
  • 13. Forward Algorithm Illustration s1 si sj sN 1 t t+1 T a1j aNj ajj aij αt(i) αt+1(j) 6.345 Automatic Speech Recognition HMM 13
  • 14. The Backward Algorithm • Similarly, let us define the backward variable, βt(i), as the probability of the partial observation sequence from time t + 1 to the end, given state si at time t and the model, i.e. βt(i) = P(ot+1ot+2 . . . oT |qt = si, λ) • It can easily be shown that: and: • By induction: N � βt (i) = j=1 βT (i) = 1, 1 ≤ i ≤ N N � P(O|λ) = πibi(o1)β1(i) i=1 t = T − 1, T − 2, . . . , 1 aijbj (ot+1) βt+1 (j), 1 ≤ i ≤ N 6.345 Automatic Speech Recognition HMM 14
  • 15. Backward Procedure Illustration s1 si sj sN 1 t t+1 T ai1 aiN aij aii βt(i) βt+1(j) 6.345 Automatic Speech Recognition HMM 15
  • 16. N � Finding Optimal State Sequences • One criterion chooses states, qt, which are individually most likely – This maximizes the expected number of correct states • Let us define γt(i)�as the probability of being in state si at time t, given the observation sequence and the model, i.e. γt(i) = � P(qt =�si|O, λ) i=1 γt(i) = 1, ∀t • Then the individually most likely state, qt, at time t is: qt =�argmax�γt(i) 1�≤ t ≤ T 1≤i≤N • Note that it can be shown that: γt(i) = � αt(i)βt(i)� P(O|λ)� 6.345 Automatic Speech Recognition HMM 16
  • 17. i Finding Optimal State Sequences • The individual optimality criterion has the problem that the optimum state sequence may not obey state transition constraints • Another optimality criterion is to choose the state sequence which maximizes P(Q, O|λ); This can be found by the Viterbi algorithm • Let us define δt(i)�as the highest probability along a single path, at time t, which accounts for the first t observations, i.e. δt(i) = � max� P(q1q2�. . . qt−1, qt =�si, o1o2�. . . ot|λ) q1,q2,...,qt−1� • By induction: δt+1(j) = [max�δt(i)aij]bj(ot+1)� • To retrieve the state sequence, we must keep track of the state sequence which gave the best path, at time t, to state si – We do this in a separate array ψt(i)� 6.345 Automatic Speech Recognition HMM 17
  • 18. T The Viterbi Algorithm 1. Initialization: δ1(i) = � πibi(o1), 1�≤ i ≤ N ψ1(i) = 0� 2. Recursion: δt(j) = � max� 1≤i≤N [δt−1(i)aij]bj(ot), 2�≤ t ≤ T 1�≤ j ≤ N ψt(j)� =�argmax[δt−1(i)aij], 2�≤ t ≤ T 1�≤ j ≤ N 1≤i≤N 3. Termination: P∗ = max� 1≤i≤N [δT (i)]� q∗ =�argmax[δT (i)] 1≤i≤N 4. Path (state-sequence) backtracking: qt ∗ =�ψt+1(qt ∗ +1), t =�T − 1, T − 2, . . . , 1� Computation ≈ N2T 6.345 Automatic Speech Recognition HMM 18
  • 19. The Viterbi Algorithm: An Example 1 s1 s3 s2 4 3 2 0 0 b b a a 1 2 3 .8 .2 .5 .5 .7 .3 .3 .7 P(a) P(b) .5 .4 .1 .5 .2 .3 O={a a b b} .40 .09 .35 .35 .40 .15 .10 .10 .15 .21 .21 .20 .20 .20 .20 .20 .20 .20 .20 .20 .10 .10 .10 .10 .10 .09 6.345 Automatic Speech Recognition HMM 19
  • 20. The Viterbi Algorithm: An Example (cont’d) 0 a aa aab aabb s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016 s1, 0 .08 s1, 0 .032 s1, 0 .0032 s1, 0 .00032 s2 s1, 0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144 s2, a .04 s2, a .042 s2, b .0168 s2, b .00336 s2, 0 .021 s2, 0 .0084 s2, 0 .00168 s2, 0 .000336 s3 s2, 0 .02 s2, a .03 s2, a .0315 s2, b .0294 s2, b .00588 0 a a b b 1.0 0.4 0.16 0.016 0.0016 s1 s2 s3 0 1 2 3 4 0.00588 0.0294 0.0315 0.03 0.02 0.00336 0.0168 0.084 0.21 0.2 6.345 Automatic Speech Recognition HMM 20
  • 21. Matching Using Forward-Backward Algorithm 0 a aa aab aabb s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016 s1, 0 .08 s1, 0 .032 s1, 0 .0032 s1, 0 .00032 s2 s1, 0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144 s2, a .04 s2, a .066 s2, b .0364 s2, b .0108 s2, 0 .033 s2, 0 .0182 s2, 0 .0054 s2, 0 .001256 s3 s2, 0 .02 s2, a .03 s2, a .0495 s2, b .0637 s2, b .0189 s1 s2 s3 0 a a b b 1.0 0.4 0.16 0.016 0.0016 0.01256 0.0540 0.182 0.33 0.2 0.020156 0.0691 0.0677 0.063 0.02 0 1 2 3 4 6.345 Automatic Speech Recognition HMM 21
  • 22. � � � Baum-Welch Re-estimation • Baum-Welch re-estimation uses EM to determine ML parameters • Define ξt(i, j) as the probability of being in state si at time t and state sj at time t + 1, given the model and observation sequence ξt(i, j) = P(qt = si, qt+1 = sj|O, λ) • Then: ξt(i, j) = αt(i)aijbj(ot+1)βt+1(j) P(O|λ) N γt(i) = ξt(i, j) j=1 • Summing γt(i) and ξt(i, j), we get: T−1 γt(i) = expected number of transitions from si t=1 T−1 ξt(i, j) = expected number of transitions from si to sj t=1 6.345 Automatic Speech Recognition HMM 22
  • 23. Baum-Welch Re-estimation Procedures s1 si sj sN 1 t-1 t t+1 t+2 T αt(i) βt+1(j) aij 6.345 Automatic Speech Recognition HMM 23
  • 24. ¯ ¯ ¯ � � T � T � Baum-Welch Re-estimation Formulas π =�expected number of times in state si at t = 1 � =�γ1(i)� aij =� expected number of transitions from state si to sj expected number of transitions from state si T−1� ξt(i, j)� t=1 =� T−1� γt(i)� t=1� bj(k) = � expected number of times in state sj with symbol vk expected number of times in state sj γt(j)� t=1� ot=vk =� γt(j)� t=1� 6.345 Automatic Speech Recognition HMM 24
  • 25. ¯ ¯ ¯ ¯ ¯ Baum-Welch Re-estimation Formulas A, ¯ π)�is the • If λ = (A, B, π)�is the initial model, and λ = (¯ B, ¯ re-estimated model. Then it can be proved that either: 1. The initial model, λ, defines a critical point of the likelihood function, in which case λ =�λ, or 2. Model λ is more likely than λ in the sense that P(O|λ̄)�> P(O|λ), i.e., we have found a new model λ from which the observation sequence is more likely to have been produced. • Thus we can improve the probability of O being observed from the model if we iteratively use λ in place of λ and repeat the re-estimation until some limiting point is reached. The resulting model is called the maximum likelihood HMM. 6.345 Automatic Speech Recognition HMM 25
  • 26. K K � � Multiple Observation Sequences • Speech recognition typically uses left-to-right HMMs. These HMMs can not be trained using a single observation sequence, because only a small number of observations are available to train each state. To obtain reliable estimates of model parameters, one must use multiple observation sequences. In this case, the re-estimation procedure needs to be modified. • Let us denote the set of K observation sequences as O =�{O(1) , O(2) , . . . , O(K) } 1 2� Tk } is the k-th observation sequence. where O(k) =�{o (k) , o (k) , . . . , o (k)� • Assume that the observations sequences are mutually independent, we want to estimate the parameters so as to maximize P(O | λ) = � P(O(k)� | λ) = � Pk k=1� k=1� 6.345 Automatic Speech Recognition HMM 26
  • 27. ¯ ¯ � � � � � � � � � � � � � Tk � Multiple Observation Sequences (cont’d) • Since the re-estimation formulas are based on frequency of occurrence of various events, we can modify them by adding up the individual frequencies of occurrence for each sequence K Tk−1� K 1�Tk−1� ξt k (i, j)� k=1� Pk t=1� αt k (i)aijbj(o (k)� t+1)βk t+1(j)� k=1�t=1� = aij =� K Tk−1� K 1�Tk−1� γt k (i)� αt k (i)βk t (i)� k=1�t=1� k=1� Pk t=1� K Tk K 1� � γt k (j)� k=1� Pk t=1� αt k (i)βk t (i)� k=1� t=1� ot (k) =v� ot (k) =v� bj(�) = � K Tk =� K Tk γt k (j)� k=1� Pk t=1� 1�� αt k (i)βk t (i)� k=1t=1� 6.345 Automatic Speech Recognition HMM 27
  • 28. Phone-based HMMs • Word-based HMMs are appropriate for small vocabulary speech recognition. For large vocabulary ASR, sub-word-based (e.g., phone-based) models are more appropriate. 6.345 Automatic Speech Recognition HMM 28
  • 29. Phone-based HMMs (cont’d) • The phone models can have many states, and words are made up from a concatenation of phone models. 6.345 Automatic Speech Recognition HMM 29
  • 30. � � Continuous Density Hidden Markov Models • A continuous density HMM replaces the discrete observation probabilities, bj(k), by a continuous PDF bj(x) • A common practice is to represent bj(x) as a mixture of Gaussians: M bj(x) = cjkN[x, µjk, Σjk] 1 ≤ j ≤ N k=1 where cjk is the mixture weight M cjk ≥ 0 (1 ≤ j ≤ N, 1 ≤ k ≤ M, and cjk = 1, 1 ≤ j ≤ N), k=1 N is the normal density, and µjk and Σjk are the mean vector and covariance matrix associated with state j and mixture k. 6.345 Automatic Speech Recognition HMM 30
  • 31. Acoustic Modelling Variations • Semi-continuous HMMs first compute a VQ codebook of size M – The VQ codebook is then modelled as a family of Gaussian PDFs – Each codeword is represented by a Gaussian PDF, and may be used together with others to model the acoustic vectors – From the CD-HMM viewpoint, this is equivalent to using the same set of M mixtures to model all the states – It is therefore often referred to as a Tied Mixture HMM • All three methods have been used in many speech recognition tasks, with varying outcomes • For large-vocabulary, continuous speech recognition with sufficient amount (i.e., tens of hours) of training data, CD-HMM systems currently yield the best performance, but with considerable increase in computation 6.345 Automatic Speech Recognition HMM 31
  • 32. Implementation Issues • Scaling: to prevent underflow • Segmental K-means Training: to train observation probabilities by first performing Viterbi alignment • Initial estimates of λ: to provide robust models • Pruning: to reduce search computation 6.345 Automatic Speech Recognition HMM 32
  • 33. References • X. Huang, A. Acero, and H. Hon, Spoken Language Processing, Prentice-Hall, 2001. • F. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997. • L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993. 6.345 Automatic Speech Recognition HMM 33