Different techniques for speech recognition

Different techniques for
speech recognition
From:
Yashi Saxena

Index
• Introduction
• Standard DTW
• Stochastic DTW
• Hidden Markov model
• Conclusion

INTRODUCTION
Non-linear sequence alignment has a vast range
of application in DNA matching, string matching,
speech recognition etc. it seeks an optimal
mapping from the test signal to template signal,
meanwhile allowing a non-linear, warping in the
test signal. It show’s its power to cut the
complexity to 0(nm). This algorithm is proposed
in 1978 (DTW) and an update version came in
1988 which has improved the recognition rate
from 89.3% to 92.9% in word recognition
experiment.

STANDARD DYNAMIC TIME
WARPING
Dynamic time warping (DTW) is an algorithm for
measuring similarity between two sequences which
may vary in time or speed. For instance, similarities
in walking patterns would be detected, even if in
one video the person was walking slowly and if in
another he or she were walking more quickly, or
even if there were accelerations and decelerations
during the course of one observation. DTW has
been applied to video, audio, and graphics —
indeed, any data which can be turned into a linear
representation can be analyzed with DTW. It was
introduced in 1978.

Algorithm
Two time series X and Y, of lengths |X| and |Y|,
X= x1,x2........,xi,.........x|X|
Y= y1,y2........,yi,.........y|Y|
Wrap path W
W= w1,w2........,wk max(|X|,|Y|)< K<|X|+|Y|
K is the length of the warp path and the kth element of the warp
path is
Wk= (i,j)
The optimal warp path is the minimum distance warp path
Dist(W) is the distance of the warp path W, and dist(wki, wkj) is
the distance between the two data point index in the kth element
of warp path.

Problem
• Windowing: (Berndt & Clifford 1994) Allowable
elements of the matrix can be restricted to those that
fall into a warping window, |i-(n/(m/j))| < R, where R is
a positive integer window width. This effectively
means that the corners of the matrix are pruned from
consideration.
• Slope Weighting: (Kruskall & Liberman 1983,Sakoe, &
Chiba 1978) If equation is replaced with g(i,j) = d(i,j) +
min{ g(i-1,j-1) , X g(i-1,j ) , X g(i,j-1) } where X is a
positive real number, we can constrain the warping by
changing the value of X. As X gets larger, the warping
path is increasing biased toward the diagonal.

• Step Patterns (Slope constraints): (Itakura 1975,
Myers et. al. 1980) We can visualize equation as a
diagram of admissible step-patterns. The arrows
illustrate the permissible steps the warping path may
take at each stage. We could replace equation with
g(i,j) = d(i,j) + min{ g(i-1,j-1) , g(i-1,j-2) , g(i- 2,j-1) },
which corresponds with the step-pattern show in
Figure 4.B. Using this equation the warping path is
forced to move one diagonal step for each step parallel
to an axis.

STOCHASTIC DTW
A lots of real signals are stochastic processes, such
as speech signal, video signal etc. therefore, in
1988 a new algorithm called stochastic DTW is
proposed. In this method, conditional probability
are used instead of local distance in standard DTW,
and transition probabilities instead of path costs.
We propose stochastic DTW method to cope with
spectral variations caused by speaker to speaker.

Algorithm
1. Replace the deterministic cost with
probabilities:-

• Then replace the right hand side with the maximum
probability and taking logarithm for P:-
General equation of stochastic DTW is:-

HIDDEN MARKOV MODEL
• An HMM is defined by a set of N states, K observation
symbols and three probabilistic matrices:
M={∏, A, B}
Where
∏= ∏i initial state probabilities
A= ai,j state transition probabilities
B= bi,j,k symbol emission probabilities

The observation symbol generation procedure for
topology is as follows:-
1. Start in state i with probability ∏i.
2. t=1
3. Move from state i to j with probability ai,j and emit
observation symbol ot = k with probability bi,j,k.
4. t= t+1
5. Go to 3.

VITERBI ALGORITHM FOR HMM
The Viterbi algorithm was conceived by Andrew
Viterbi in 1967 as an error-correction scheme for noisy
digital communication links. The Viterbi algorithm is a
dynamic programming algorithm for finding the most
likely sequence of hidden states – called the Viterbi
path – that results in a sequence of observed events,
especially in the context of Markov information
sources, and more generally, hidden Markov models.

Different techniques for speech recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Different techniques for speech recognition

Similar to Different techniques for speech recognition (20)

Different techniques for speech recognition