SlideShare a Scribd company logo
1 of 18
Different techniques for
speech recognition
From:
Yashi Saxena
Index
• Introduction
• Standard DTW
• Stochastic DTW
• Hidden Markov model
• Conclusion
INTRODUCTION
Non-linear sequence alignment has a vast range
of application in DNA matching, string matching,
speech recognition etc. it seeks an optimal
mapping from the test signal to template signal,
meanwhile allowing a non-linear, warping in the
test signal. It show’s its power to cut the
complexity to 0(nm). This algorithm is proposed
in 1978 (DTW) and an update version came in
1988 which has improved the recognition rate
from 89.3% to 92.9% in word recognition
experiment.
STANDARD DYNAMIC TIME
WARPING
Dynamic time warping (DTW) is an algorithm for
measuring similarity between two sequences which
may vary in time or speed. For instance, similarities
in walking patterns would be detected, even if in
one video the person was walking slowly and if in
another he or she were walking more quickly, or
even if there were accelerations and decelerations
during the course of one observation. DTW has
been applied to video, audio, and graphics —
indeed, any data which can be turned into a linear
representation can be analyzed with DTW. It was
introduced in 1978.
Algorithm
Two time series X and Y, of lengths |X| and |Y|,
X= x1,x2........,xi,.........x|X|
Y= y1,y2........,yi,.........y|Y|
Wrap path W
W= w1,w2........,wk max(|X|,|Y|)< K<|X|+|Y|
K is the length of the warp path and the kth element of the warp
path is
Wk= (i,j)
The optimal warp path is the minimum distance warp path
Dist(W) is the distance of the warp path W, and dist(wki, wkj) is
the distance between the two data point index in the kth element
of warp path.
Problem
• Windowing: (Berndt & Clifford 1994) Allowable
elements of the matrix can be restricted to those that
fall into a warping window, |i-(n/(m/j))| < R, where R is
a positive integer window width. This effectively
means that the corners of the matrix are pruned from
consideration.
• Slope Weighting: (Kruskall & Liberman 1983,Sakoe, &
Chiba 1978) If equation is replaced with g(i,j) = d(i,j) +
min{ g(i-1,j-1) , X g(i-1,j ) , X g(i,j-1) } where X is a
positive real number, we can constrain the warping by
changing the value of X. As X gets larger, the warping
path is increasing biased toward the diagonal.
• Step Patterns (Slope constraints): (Itakura 1975,
Myers et. al. 1980) We can visualize equation as a
diagram of admissible step-patterns. The arrows
illustrate the permissible steps the warping path may
take at each stage. We could replace equation with
g(i,j) = d(i,j) + min{ g(i-1,j-1) , g(i-1,j-2) , g(i- 2,j-1) },
which corresponds with the step-pattern show in
Figure 4.B. Using this equation the warping path is
forced to move one diagonal step for each step parallel
to an axis.
STOCHASTIC DTW
A lots of real signals are stochastic processes, such
as speech signal, video signal etc. therefore, in
1988 a new algorithm called stochastic DTW is
proposed. In this method, conditional probability
are used instead of local distance in standard DTW,
and transition probabilities instead of path costs.
We propose stochastic DTW method to cope with
spectral variations caused by speaker to speaker.
Algorithm
1. Replace the deterministic cost with
probabilities:-
• Then replace the right hand side with the maximum
probability and taking logarithm for P:-
General equation of stochastic DTW is:-
HIDDEN MARKOV MODEL
• An HMM is defined by a set of N states, K observation
symbols and three probabilistic matrices:
M={∏, A, B}
Where
∏= ∏i initial state probabilities
A= ai,j state transition probabilities
B= bi,j,k symbol emission probabilities
The observation symbol generation procedure for
topology is as follows:-
1. Start in state i with probability ∏i.
2. t=1
3. Move from state i to j with probability ai,j and emit
observation symbol ot = k with probability bi,j,k.
4. t= t+1
5. Go to 3.
VITERBI ALGORITHM FOR HMM
The Viterbi algorithm was conceived by Andrew
Viterbi in 1967 as an error-correction scheme for noisy
digital communication links. The Viterbi algorithm is a
dynamic programming algorithm for finding the most
likely sequence of hidden states – called the Viterbi
path – that results in a sequence of observed events,
especially in the context of Markov information
sources, and more generally, hidden Markov models.
Thank you

More Related Content

What's hot

Massive MIMO and Random Matrix
Massive MIMO and Random MatrixMassive MIMO and Random Matrix
Massive MIMO and Random Matrix
VARUN KUMAR
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
Max Robertson
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wed
Vin Voro
 

What's hot (19)

Massive MIMO and Random Matrix
Massive MIMO and Random MatrixMassive MIMO and Random Matrix
Massive MIMO and Random Matrix
 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
 
Dcp project
Dcp projectDcp project
Dcp project
 
7076 chapter5 slides
7076 chapter5 slides7076 chapter5 slides
7076 chapter5 slides
 
Performance of MMSE Denoise Signal Using LS-MMSE Technique
Performance of MMSE Denoise Signal Using LS-MMSE  TechniquePerformance of MMSE Denoise Signal Using LS-MMSE  Technique
Performance of MMSE Denoise Signal Using LS-MMSE Technique
 
Optimum Receiver corrupted by AWGN Channel
Optimum Receiver corrupted by AWGN ChannelOptimum Receiver corrupted by AWGN Channel
Optimum Receiver corrupted by AWGN Channel
 
15
1515
15
 
Ft and FFT
Ft and FFTFt and FFT
Ft and FFT
 
Second Order Perturbations - National Astronomy Meeting 2011
Second Order Perturbations - National Astronomy Meeting 2011Second Order Perturbations - National Astronomy Meeting 2011
Second Order Perturbations - National Astronomy Meeting 2011
 
IRJET- A Brief Study on Fourier Transform and its Applications
IRJET- A Brief Study on Fourier Transform and its ApplicationsIRJET- A Brief Study on Fourier Transform and its Applications
IRJET- A Brief Study on Fourier Transform and its Applications
 
Fft analysis
Fft analysisFft analysis
Fft analysis
 
Path loss models
Path loss modelsPath loss models
Path loss models
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wed
 
On The Fundamental Aspects of Demodulation
On The Fundamental Aspects of DemodulationOn The Fundamental Aspects of Demodulation
On The Fundamental Aspects of Demodulation
 
Chap04
Chap04Chap04
Chap04
 
Fourier transform convergence
Fourier transform convergenceFourier transform convergence
Fourier transform convergence
 
Adm
AdmAdm
Adm
 
Lb2519271931
Lb2519271931Lb2519271931
Lb2519271931
 

Viewers also liked

Compte el conillet ramoneet
Compte el conillet ramoneetCompte el conillet ramoneet
Compte el conillet ramoneet
Angymor3
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
Особености на sniffing атаките и как да се предпазим от тях
Особености на sniffing атаките и как да се предпазим от тяхОсобености на sniffing атаките и как да се предпазим от тях
Особености на sniffing атаките и как да се предпазим от тях
University of Economics - Varna
 
Cno overview -_final
Cno overview -_finalCno overview -_final
Cno overview -_final
CNOServices
 
Regulamento Latada UTAD' 12
Regulamento Latada UTAD' 12Regulamento Latada UTAD' 12
Regulamento Latada UTAD' 12
Nuno Ribeiro
 
Social media marketing
Social media marketingSocial media marketing
Social media marketing
Yi Zhou
 
Senior project power point
Senior project power pointSenior project power point
Senior project power point
JSchop
 
Later People of the Fertile Crescent
Later People of the Fertile CrescentLater People of the Fertile Crescent
Later People of the Fertile Crescent
ssclasstorremar
 
Company Analysis On Ashiana Housings
Company Analysis On Ashiana HousingsCompany Analysis On Ashiana Housings
Company Analysis On Ashiana Housings
Sandeep Patel
 

Viewers also liked (20)

Compte el conillet ramoneet
Compte el conillet ramoneetCompte el conillet ramoneet
Compte el conillet ramoneet
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
31. الوصية الثالثة صباح الأحد 17 إبريل 2011
31.  الوصية الثالثة       صباح الأحد 17 إبريل 201131.  الوصية الثالثة       صباح الأحد 17 إبريل 2011
31. الوصية الثالثة صباح الأحد 17 إبريل 2011
 
Особености на sniffing атаките и как да се предпазим от тях
Особености на sniffing атаките и как да се предпазим от тяхОсобености на sniffing атаките и как да се предпазим от тях
Особености на sniffing атаките и как да се предпазим от тях
 
Cno overview -_final
Cno overview -_finalCno overview -_final
Cno overview -_final
 
Wonderful mother teresa_--الأم العظيمه تريزا
Wonderful mother teresa_--الأم العظيمه تريزاWonderful mother teresa_--الأم العظيمه تريزا
Wonderful mother teresa_--الأم العظيمه تريزا
 
Presentation1
Presentation1Presentation1
Presentation1
 
شخصية القديس اثناثيوس الاب متى المسكين
شخصية القديس اثناثيوس  الاب متى المسكينشخصية القديس اثناثيوس  الاب متى المسكين
شخصية القديس اثناثيوس الاب متى المسكين
 
Melissa welcome presentation
Melissa welcome presentationMelissa welcome presentation
Melissa welcome presentation
 
Regulamento Latada UTAD' 12
Regulamento Latada UTAD' 12Regulamento Latada UTAD' 12
Regulamento Latada UTAD' 12
 
English Profil
English ProfilEnglish Profil
English Profil
 
Creatividad
CreatividadCreatividad
Creatividad
 
Clean code
Clean codeClean code
Clean code
 
Social media marketing
Social media marketingSocial media marketing
Social media marketing
 
El huerto
El huertoEl huerto
El huerto
 
Reverse mortgage
Reverse mortgageReverse mortgage
Reverse mortgage
 
Senior project power point
Senior project power pointSenior project power point
Senior project power point
 
Later People of the Fertile Crescent
Later People of the Fertile CrescentLater People of the Fertile Crescent
Later People of the Fertile Crescent
 
Company Analysis On Ashiana Housings
Company Analysis On Ashiana HousingsCompany Analysis On Ashiana Housings
Company Analysis On Ashiana Housings
 
пушкин
пушкинпушкин
пушкин
 

Similar to Different techniques for speech recognition

Similar to Different techniques for speech recognition (20)

ADAPTIVE CONTOURLET TRANSFORM AND WAVELET TRANSFORM BASED IMAGE STEGANOGRAPHY...
ADAPTIVE CONTOURLET TRANSFORM AND WAVELET TRANSFORM BASED IMAGE STEGANOGRAPHY...ADAPTIVE CONTOURLET TRANSFORM AND WAVELET TRANSFORM BASED IMAGE STEGANOGRAPHY...
ADAPTIVE CONTOURLET TRANSFORM AND WAVELET TRANSFORM BASED IMAGE STEGANOGRAPHY...
 
KZ Spatial Waves Separations
KZ Spatial Waves SeparationsKZ Spatial Waves Separations
KZ Spatial Waves Separations
 
Digital communication unit II
Digital communication unit IIDigital communication unit II
Digital communication unit II
 
IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...
 
Module-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdfModule-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdf
 
Performance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal ApproximationPerformance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal Approximation
 
L010628894
L010628894L010628894
L010628894
 
Image compression based on
Image compression based onImage compression based on
Image compression based on
 
Dcom ppt(en.39) dpcm
Dcom ppt(en.39) dpcmDcom ppt(en.39) dpcm
Dcom ppt(en.39) dpcm
 
Image Compression using WDR & ASWDR Techniques with different Wavelet Codecs
Image Compression using WDR & ASWDR Techniques with different Wavelet CodecsImage Compression using WDR & ASWDR Techniques with different Wavelet Codecs
Image Compression using WDR & ASWDR Techniques with different Wavelet Codecs
 
Image encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityImage encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrity
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
 
E017263040
E017263040E017263040
E017263040
 
40120140501004
4012014050100440120140501004
40120140501004
 
40120140501004
4012014050100440120140501004
40120140501004
 
Where Next
Where NextWhere Next
Where Next
 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transform
 
I017535359
I017535359I017535359
I017535359
 
QRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsQRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband Signals
 
UPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptxUPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptx
 

Different techniques for speech recognition

  • 1. Different techniques for speech recognition From: Yashi Saxena
  • 2. Index • Introduction • Standard DTW • Stochastic DTW • Hidden Markov model • Conclusion
  • 3. INTRODUCTION Non-linear sequence alignment has a vast range of application in DNA matching, string matching, speech recognition etc. it seeks an optimal mapping from the test signal to template signal, meanwhile allowing a non-linear, warping in the test signal. It show’s its power to cut the complexity to 0(nm). This algorithm is proposed in 1978 (DTW) and an update version came in 1988 which has improved the recognition rate from 89.3% to 92.9% in word recognition experiment.
  • 4.
  • 5. STANDARD DYNAMIC TIME WARPING Dynamic time warping (DTW) is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics — indeed, any data which can be turned into a linear representation can be analyzed with DTW. It was introduced in 1978.
  • 6.
  • 7. Algorithm Two time series X and Y, of lengths |X| and |Y|, X= x1,x2........,xi,.........x|X| Y= y1,y2........,yi,.........y|Y| Wrap path W W= w1,w2........,wk max(|X|,|Y|)< K<|X|+|Y| K is the length of the warp path and the kth element of the warp path is Wk= (i,j) The optimal warp path is the minimum distance warp path Dist(W) is the distance of the warp path W, and dist(wki, wkj) is the distance between the two data point index in the kth element of warp path.
  • 8. Problem • Windowing: (Berndt & Clifford 1994) Allowable elements of the matrix can be restricted to those that fall into a warping window, |i-(n/(m/j))| < R, where R is a positive integer window width. This effectively means that the corners of the matrix are pruned from consideration. • Slope Weighting: (Kruskall & Liberman 1983,Sakoe, & Chiba 1978) If equation is replaced with g(i,j) = d(i,j) + min{ g(i-1,j-1) , X g(i-1,j ) , X g(i,j-1) } where X is a positive real number, we can constrain the warping by changing the value of X. As X gets larger, the warping path is increasing biased toward the diagonal.
  • 9. • Step Patterns (Slope constraints): (Itakura 1975, Myers et. al. 1980) We can visualize equation as a diagram of admissible step-patterns. The arrows illustrate the permissible steps the warping path may take at each stage. We could replace equation with g(i,j) = d(i,j) + min{ g(i-1,j-1) , g(i-1,j-2) , g(i- 2,j-1) }, which corresponds with the step-pattern show in Figure 4.B. Using this equation the warping path is forced to move one diagonal step for each step parallel to an axis.
  • 10. STOCHASTIC DTW A lots of real signals are stochastic processes, such as speech signal, video signal etc. therefore, in 1988 a new algorithm called stochastic DTW is proposed. In this method, conditional probability are used instead of local distance in standard DTW, and transition probabilities instead of path costs. We propose stochastic DTW method to cope with spectral variations caused by speaker to speaker.
  • 11. Algorithm 1. Replace the deterministic cost with probabilities:-
  • 12. • Then replace the right hand side with the maximum probability and taking logarithm for P:- General equation of stochastic DTW is:-
  • 13. HIDDEN MARKOV MODEL • An HMM is defined by a set of N states, K observation symbols and three probabilistic matrices: M={∏, A, B} Where ∏= ∏i initial state probabilities A= ai,j state transition probabilities B= bi,j,k symbol emission probabilities
  • 14. The observation symbol generation procedure for topology is as follows:- 1. Start in state i with probability ∏i. 2. t=1 3. Move from state i to j with probability ai,j and emit observation symbol ot = k with probability bi,j,k. 4. t= t+1 5. Go to 3.
  • 15.
  • 16. VITERBI ALGORITHM FOR HMM The Viterbi algorithm was conceived by Andrew Viterbi in 1967 as an error-correction scheme for noisy digital communication links. The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events, especially in the context of Markov information sources, and more generally, hidden Markov models.
  • 17.