Major Project Mid-Term Presentation :Speaker Verification for Remote Authentication<br />Members: <br />Ganesh Tiwari (063...
Introduction<br />Voice biometric system user login<br />Text-Prompted system<br />The claimant is asked to speak a promp...
Block Diagram of Speaker / Speech Recognition System<br />
Signal Capture and Pre-Processing<br />
Capture and Preprocessing<br />Get the audio signal i.e., ADC<br />Make suitable for feature extraction<br />
Capture and Preprocessing :Capture<br />22050 Hz<br />16-bits,Signed<br />Little Endian<br />Mono<br />Uncompressed PCM<br />
Capture and Preprocessing :PCM Extract<br />
Capture and Preprocessing : Silence Removal<br />Algorithm described in paper<br />‘a new method for silence removal and e...
Capture and Preprocessing :Pre-Emphasis<br />Boosting the high frequency energy<br />In time domain,<br />y[n] = x[n]−αx[n...
Capture and Preprocessing : Framing<br />Speech Signal is stationary (statistical properties) for 10-30 ms<br />50% overla...
Capture and Preprocessing :Windowing<br />Windowing is done on the frame blocked signal<br />Hamming window<br />
Feature Extraction<br />
Feature Extraction<br />Transform the input audio signal into a sequence of acoustic feature vectors<br />MFCC : Mel Filte...
Feature Extraction : Fourier Transform<br />Gives information about the amount of energy at each frequency band<br />FFT u...
Feature Extraction : Mel Filter<br />We used filter bank of triangular filters spaced in Mel scale<br />
Feature Extraction : Mel Filter (contd..)<br />Mel Filter<br />Where,<br />
Feature Extraction :Log, IFT(DCT)<br />Log<br />DCT<br />	MFCC <br />
Feature Extraction : Cepstral Mean Subtraction<br />CMS: for minimizing channel effect<br />
Feature Extraction : Energy and Deltas<br />For completeness of feature vector and for achieving high recognition rate<br ...
Composition of Feature Vector<br />12 MFCC Features<br />12 Delta MFCC<br />12 Delta-Delta MFCC<br />1 Energy Feature<br /...
Speaker Recognition/Verification by GMM<br />
Gaussian Mixture Model<br />Parametric probability density function<br />Based on clustering technique<br />M Gaussian com...
GMM Training<br />Goal: estimate the parameters<br />Method: Maximum Likelihood estimation<br />Input: X = {𝑥1,𝑥1,…,𝑥𝑇}<br...
Verification<br />Decision: Hypothesis Test<br />	H0: the speaker is the claimed speaker<br />	H1: the speaker is an impos...
Speech Recognition by HMM/VQ<br />
Hidden Markov Model :Definition<br />Hidden Markov Model (HMM) is the statistical model<br />HMM is the extension of Marko...
Codebook Generation<br />K-Means Clustering<br />Clustering the whole database & Codebook Generation<br />VQ : Vector Quan...
Speech Recognition System: By : HMM / VQ<br />
Hidden Markov Model :Training<br />Training by: <br />Forward backward (Baum-Welch) algorithm<br />Forward-backward algori...
Hidden Markov Model :Verification/Matching<br />Viterbi algorithm is used<br />Input is <br />Observation sequence, given ...
Problem Faced<br />Learning curve<br />Complex Mathematics<br />Flex & Java <br />Connectivity (initially)<br />Data conve...
Remaining Tasks<br />Speech Training Data Collection<br />Model Training (HMM, GMM)<br />Module Integration<br />Testing<b...
Thanks<br />
Upcoming SlideShare
Loading in...5
×

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation

1,603

Published on

Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com

ABSTRACT OF PROJECT>>>

Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,603
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
98
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation

  1. 1. Major Project Mid-Term Presentation :Speaker Verification for Remote Authentication<br />Members: <br />Ganesh Tiwari (063BCT510)<br />MadhavPandey(063BCT514)<br />ManojShrestha(063BCT518)<br />Supervisor : <br />Dr. SubarnaShakya<br />Associate Professor<br />
  2. 2. Introduction<br />Voice biometric system user login<br />Text-Prompted system<br />The claimant is asked to speak a prompted text <br />Speech and Speaker Recognition/Verification<br />More secure to playback attack.<br />Web Application<br />Client (Adobe Flex) : Voice Capture, preprocessing and feature extraction <br />Server (JAVA) : Training / Classification<br />BlazeDS RPC for JAVA-Flex Connectivity<br />
  3. 3. Block Diagram of Speaker / Speech Recognition System<br />
  4. 4. Signal Capture and Pre-Processing<br />
  5. 5. Capture and Preprocessing<br />Get the audio signal i.e., ADC<br />Make suitable for feature extraction<br />
  6. 6. Capture and Preprocessing :Capture<br />22050 Hz<br />16-bits,Signed<br />Little Endian<br />Mono<br />Uncompressed PCM<br />
  7. 7. Capture and Preprocessing :PCM Extract<br />
  8. 8. Capture and Preprocessing : Silence Removal<br />Algorithm described in paper<br />‘a new method for silence removal and endpoint detection’ †<br />†G. Saha, SandipanChakroborty, SumanSenapati of Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Khragpur, India <br />
  9. 9. Capture and Preprocessing :Pre-Emphasis<br />Boosting the high frequency energy<br />In time domain,<br />y[n] = x[n]−αx[n−1], 0.9 ≤ α≤ 1.0<br />
  10. 10. Capture and Preprocessing : Framing<br />Speech Signal is stationary (statistical properties) for 10-30 ms<br />50% overlapped frames <br />each of 23ms is used<br />
  11. 11. Capture and Preprocessing :Windowing<br />Windowing is done on the frame blocked signal<br />Hamming window<br />
  12. 12. Feature Extraction<br />
  13. 13. Feature Extraction<br />Transform the input audio signal into a sequence of acoustic feature vectors<br />MFCC : Mel Filter CepstralCoefficients as Feature<br />Perceptual approach <br />Human Ear processes audio signal in Mel scale<br />Mel scale : linear up to 1KHz and logarithmic after 1KHz<br />MFCC gives distribution of energy in Mel frequency band<br />Calculated for each frame<br />
  14. 14. Feature Extraction : Fourier Transform<br />Gives information about the amount of energy at each frequency band<br />FFT used<br />
  15. 15. Feature Extraction : Mel Filter<br />We used filter bank of triangular filters spaced in Mel scale<br />
  16. 16. Feature Extraction : Mel Filter (contd..)<br />Mel Filter<br />Where,<br />
  17. 17. Feature Extraction :Log, IFT(DCT)<br />Log<br />DCT<br /> MFCC <br />
  18. 18. Feature Extraction : Cepstral Mean Subtraction<br />CMS: for minimizing channel effect<br />
  19. 19. Feature Extraction : Energy and Deltas<br />For completeness of feature vector and for achieving high recognition rate<br />A Energy Feature<br />A delta or velocity feature, and a double delta or acceleration feature<br />Calculated by linear regression of regression window M<br />
  20. 20. Composition of Feature Vector<br />12 MFCC Features<br />12 Delta MFCC<br />12 Delta-Delta MFCC<br />1 Energy Feature<br />1 Delta Energy Feature<br />1 Delta-Delta Energy Feature<br /> 39 Features from each frame<br />
  21. 21. Speaker Recognition/Verification by GMM<br />
  22. 22. Gaussian Mixture Model<br />Parametric probability density function<br />Based on clustering technique<br />M Gaussian components<br />𝑝(𝑥/)= 𝑚=1𝑀𝑤𝑚 .  𝑔𝑚(𝑥/𝜇𝑚 , 𝐶𝑚)<br />𝑥: a k-dimensional random vector<br />𝑤𝑚: mixture weight of mth component<br />𝑔𝑚 : k-dimensional Gaussian function (pdf) <br />𝑔𝑚𝑥/𝜇𝑚 , 𝐶𝑚 <br />= 12𝜋𝐾.|𝐶𝑚| exp{−12𝑥−𝜇𝑚 .(𝐶𝑚−1(𝑥−𝜇𝑚 ))}<br /> = (𝑤𝑚, 𝜇𝑚 ,𝐶𝑚)<br /> <br />
  23. 23. GMM Training<br />Goal: estimate the parameters<br />Method: Maximum Likelihood estimation<br />Input: X = {𝑥1,𝑥1,…,𝑥𝑇}<br />P(X/) =𝑡=1𝑇𝑝(𝑥𝑡/)<br />Maximize with Expectation Maximization algorithm <br />Iterative process: <br />initial model: 𝑖<br />new model: 𝑖+1<br />P(X/ 𝑖+1) ≥ P(X/ 𝑖)<br /> <br />
  24. 24. Verification<br />Decision: Hypothesis Test<br /> H0: the speaker is the claimed speaker<br /> H1: the speaker is an imposter<br />Based on likelihood ratio<br />  = P(X/)P(X/)<br />Decision by threshold<br />< 𝜃𝑇reject identity claim<br /> > 𝜃𝑇 accept identity claim<br /> <br />
  25. 25. Speech Recognition by HMM/VQ<br />
  26. 26. Hidden Markov Model :Definition<br />Hidden Markov Model (HMM) is the statistical model<br />HMM is the extension of Markov Process<br />HMM has hidden states and observable symbols per states<br />HMM Model :<br />Observed data : feature vector <br />Hidden states : phonemes<br />
  27. 27. Codebook Generation<br />K-Means Clustering<br />Clustering the whole database & Codebook Generation<br />VQ : Vector Quantization is used for mapping each input feature vector to discrete quantized symbols<br />Codebook for each incoming feature vector is built <br />Compare it to each of the prototype vectors in codebook <br />Select the one which is closest (by some distancemetric)<br />Replace the input vector by the index of this prototype vector observation sequence<br />
  28. 28. Speech Recognition System: By : HMM / VQ<br />
  29. 29. Hidden Markov Model :Training<br />Training by: <br />Forward backward (Baum-Welch) algorithm<br />Forward-backward algorithm iteratively re-estimates the parameters and improves the probability that given observation are generated by the new parameters<br />Three parameters need to be re-estimated:<br />Initial state distribution: πi<br />Transition probabilities: ai,j<br />Emission probabilities: bi(ot)<br />Input is observation sequence, given by VQ<br />
  30. 30. Hidden Markov Model :Verification/Matching<br />Viterbi algorithm is used<br />Input is <br />Observation sequence, given by VQ<br />HMM model of the word<br />Best matched word is returned<br />
  31. 31. Problem Faced<br />Learning curve<br />Complex Mathematics<br />Flex & Java <br />Connectivity (initially)<br />Data conversion<br />
  32. 32. Remaining Tasks<br />Speech Training Data Collection<br />Model Training (HMM, GMM)<br />Module Integration<br />Testing<br />
  33. 33. Thanks<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×