Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text-Independent Speaker Verification

11,875 views

Published on

Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.

Published in: Technology
  • Login to see the comments

Text-Independent Speaker Verification

  1. 1. Speaker Recognition<br />Cody A. Ray<br />ECES 435 Final Project<br />March 11, 2010<br />
  2. 2. Speaker Recognition<br />Speaker Identification<br />Speaker Verification<br />Text<br />Dependent<br />Text<br />Independent<br />Text<br />Dependent<br />Text<br />Independent<br />
  3. 3. Speaker Recognition System<br />Training speech<br />Feature Vector<br />Target & Background<br />Feature <br />Extraction<br />Training<br />Speaker<br />Model<br />Score<br />Test speech<br />Feature<br />Extraction<br />Matching<br />Testing<br />Verification<br /><ul><li>Cepstrum
  4. 4. LPCC
  5. 5. MFCC
  6. 6. Glottal Flow Derivative
  7. 7. Deterministic Models
  8. 8. Min Distance
  9. 9. DTW
  10. 10. Stochastic Models
  11. 11. GMM
  12. 12. HMM
  13. 13. Minimum Distance
  14. 14. Maximum-Likelihood
  15. 15. Maximum a posteriori
  16. 16. Minimum-Mean-Squared Error</li></li></ul><li>Feature Extraction<br />Big surprise here – MFCCs!<br />Speech signal<br />x[m] w[n-m]<br />X(n, w)<br />Window<br />DFT<br />| . |<br />Mel-Scale<br />Emel(n, l)<br />MFCCs<br />DCT<br />Filter Bank<br />Log<br />MFCC - 12 coefficients (skip 0’th order coefficient)<br />256 sample frames, 128 sample increment, Hamming window<br />Triangular filters in mel domain (absolute magnitude) <br />
  17. 17. Mel Frequency Bank<br />
  18. 18. System 1: Minimum-Distance<br />Average of mel-cepstral features for test and training data<br />
  19. 19. Minimum-Distance Classifier<br />Mean-squared difference between average testing and training feature vectors<br />
  20. 20. System 2: Gaussian Mixture Model<br />Multivariate Normal Distribution<br />
  21. 21. Gaussian Mixture Model<br />
  22. 22. GMM Speaker Recognition System<br />Target<br />Model<br />Feature Vectors<br />Imposter 1<br />Imposter 2<br />
  23. 23. Log-Likelihood Ratio<br />
  24. 24. Experiments<br />8 Speakers (4 Male, 4 Female)<br />2 Sentences Each<br />Don’t ask me to carry an oily rag like that<br />She had your dark suit in greasy wash water all year<br />“Rag” used for training, “suit” for testing<br />
  25. 25. Results<br />
  26. 26. Results<br />
  27. 27. Results<br />Threshold = 0.12<br />Accuracy = 91%<br />
  28. 28. Results<br />Threshold = 0.11<br />Accuracy = 91%<br />
  29. 29. Conclusions<br />Accuracy isn’t terrible, but room to improve<br />Threshold tradeoff<br />false-negatives vs. false-positives<br />DON’T use Minimum-Distance classifier for text-independent authentication systems<br />
  30. 30. Future Work<br />Implement LLR Classifier using GMM library<br />Repeat experiment with GMM-based system<br />Compare Min-Distance and GMM results<br />

×