Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.
Analysis window w[n]Discrete STFTWeight the magnitudes of X(n, w) with mel-scale filter bank
GMM of target speakerGMM for a collection of imposters that we call the background modelRatio of probability that the collection of feature vectors X is from the claimant speaker to the probability that X is not from the claimed speaker (ie, background)
Mean-squared error (distance metric) is symmetric, so what was used for training/testing doesn’t matter. Output is just transposed with the other order.
Mean-squared difference between average testing and training feature vectors for each speaker
Minimize the distances on diagonals (matches)
0 false negatives6 false positives
1 false negatives5 false positives
Text-Independent Speaker Verification
Speaker Recognition<br />Cody A. Ray<br />ECES 435 Final Project<br />March 11, 2010<br />
Experiments<br />8 Speakers (4 Male, 4 Female)<br />2 Sentences Each<br />Don’t ask me to carry an oily rag like that<br />She had your dark suit in greasy wash water all year<br />“Rag” used for training, “suit” for testing<br />
Conclusions<br />Accuracy isn’t terrible, but room to improve<br />Threshold tradeoff<br />false-negatives vs. false-positives<br />DON’T use Minimum-Distance classifier for text-independent authentication systems<br />
Future Work<br />Implement LLR Classifier using GMM library<br />Repeat experiment with GMM-based system<br />Compare Min-Distance and GMM results<br />