Tribhuvan University                 Institute of Engineering                    Pulchowk Campus   Department of Electroni...
INTRODUCTION   Voice biometric system       User login   Text-Prompted system       Claimant is asked to speak a promp...
OUR SYSTEM   Feature : MFCC   Modeling and Classifications : both statistical       GMM - Speaker Modeling :       HMM...
PROPERTIES OF SPEECH SIGNAL   Carries both Speech Content and Speaker identity   What makes Speech Signal Unique ?     ...
UNIQUENESS IN PHONEME                                                                              Phoneme /ah/           ...
Pre-Processing and Feature Extraction
PREPROCESSING : STEPS           1)Silence Removal  1 0.5  0-0.5  -1       0      1    2              3          4         ...
PREPROCESSING :STEPS (CONTD..)1)Silence Removal                           2)Pre-Emphasis                   0.05          ...
PREPROCESSING :STEPS (CONTD..)                                3)Framing1)Silence Removal2)Pre-Emphasis      50% overlap...
PREPROCESSING :STEPS (CONTD..)1)Silence Removal2)Pre-Emphasis3)Framing                                             4)Wi...
FEATURE EXTRACTION   MFCC : Mel Filter Cepstral Coefficients       Perceptual approach           Human Ear processes au...
MFCC EXTRACTION: (CONTD..)   Steps :    FFT         Mel Filter              Log       DCT              CMS           ...
EXTRA FEATURES :ENERGY AND DELTAS   For achieving high recognition rate   A Energy Feature   Delta and Delta-Delta    ...
COMPOSITION OF FEATURE VECTOR  12 MFCC Features  12 Δ MFCC  12 Δ Δ MFCC   1 Energy Feature   1 Δ Energy   1 Δ Δ Energy 39...
Speech Recognition/Verification by           HMM/VQ
HIDDEN MARKOV MODEL (HMM)   HMM is the extension of Markov Process   Markov Process consist of observable states   HMM ...
HMM (CONTD…)   Parameters    1) The initial state distribution (π)    2) State transition probability distribution (A)   ...
EXAMPLE:PRONUNCIATION MODEL OF WORD TOMATO                  (A,B, )
HMM IMPLEMENTATION   Feature Vector  observation symbols , 256   Phonemes hidden states, 6   Left to right HMM   Dis...
SPEECH RECOGNITION SYSTEM
VECTOR QUANTIZATION
Speaker Recognition/Verification by              GMM
SPEAKER VERIFICATION SYSTEM
SPEAKER MODELING (GMM) Gaussian         Mixture Model     Parametric probability density function     Based on soft clu...
SPEAKER MODEL TRAINING Estimate the model parameters Expectation Maximization algorithm
SPEAKER VERIFICATION   Based on likelihood ratio             ℎ    ℎ  ′          =                 ℎ     ′
TOOLS USED Languages:     Adobe Flex     Java     Blaze DS for RPC Servers:     Apache Tomcat     MySQL Versioning...
OUTPUT : SNAPSHOT (GUI)
APPLICATION AREAS   Telephone transaction     Telephone credit card purchase,     Telephone stock trading   Access con...
LIMITATION AND FUTURE ENHANCEMENT   Noise reduction   Training on more data   Combine with     other features     oth...
Thanks   Any queries ?
Upcoming SlideShare
Loading in …5
×

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

2,127 views

Published on

Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com

ABSTRACT OF PROJECT>>>

Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,127
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
122
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

  1. 1. Tribhuvan University Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering MAJOR PROJECT FINAL PRESENTATION : TEXT PROMPTED REMOTE SPEAKER AUTHENTICATIONProject Supervisor : Project Members:Dr. Subarna Shakya Ganesh Tiwari (75010)Associate Professor Madhav Pandey(75014) Manoj Shrestha(75018)Internal Examiner: External ExaminerEr. Manoj Ghimire Er. Bimal Acharya
  2. 2. INTRODUCTION Voice biometric system  User login Text-Prompted system  Claimant is asked to speak a prompted(random) text  Speech and Speaker Recognition Why Text prompted ?  Playback attack
  3. 3. OUR SYSTEM Feature : MFCC Modeling and Classifications : both statistical  GMM - Speaker Modeling :  HMM/VQ - Speech Modeling :
  4. 4. PROPERTIES OF SPEECH SIGNAL Carries both Speech Content and Speaker identity What makes Speech Signal Unique ?  Each phoneme resonates at its own fundamental frequency and harmonics of it  Studied over short period : short time spectral analysis What is Speaker Dependent information  Fundamental frequency, primarily  function of the dimensions and tension of the vocal chords  size and shape of the mouth, throat, nose, and teeth  Studied over long period : all the variations from that speaker
  5. 5. UNIQUENESS IN PHONEME Phoneme /ah/ 0.15 0.1 0.05 0Amplitude -0.05 -0.1 -0.15 Phoneme /i:/ -0.2 0 500 1000 1500 2000 2500 Samples
  6. 6. Pre-Processing and Feature Extraction
  7. 7. PREPROCESSING : STEPS 1)Silence Removal 1 0.5 0-0.5 -1 0 1 2 3 4 5 6 7 8 9 4 x 10Silence Signal 1 0.5 0 -0.5Silence Removed -1 0 0.5 1 1.5 2 2.5 3 3.5 4 4
  8. 8. PREPROCESSING :STEPS (CONTD..)1)Silence Removal 2)Pre-Emphasis 0.05 0.04 Suppressed high Frequencies 0.03 |Y(f)| 0.02 0.01 0 0 2000 4000 6000 8000 10000 12000 Frequency (Hz) -3 x 10 5 4 Boosted high Frequencies 3 |Y(f)| 2 1 0 0 2000 4000 6000 8000 10000 12000 Frequency (Hz)
  9. 9. PREPROCESSING :STEPS (CONTD..) 3)Framing1)Silence Removal2)Pre-Emphasis  50% overlapped, 23ms
  10. 10. PREPROCESSING :STEPS (CONTD..)1)Silence Removal2)Pre-Emphasis3)Framing 4)Windowing 0.05 0.04 0.03 0.02 0.01 0 -0.01 0.04 -0.02 0.03 -0.03 -0.04 0.02 -0.05 0 200 400 600 800 1000 1200 0.01 0 -0.01 1 Hamming Window -0.02 0.9 0.8 -0.03 0.7 -0.04 0.6 0 200 400 600 800 1000 1200 0.5 0.4 0.3 0.2 0.1 Windowed Signal 0 10 20 30 40 50 60 Hamming Window
  11. 11. FEATURE EXTRACTION MFCC : Mel Filter Cepstral Coefficients  Perceptual approach  Human Ear processes audio signal in Mel scale  Mel scale : linear up to 1KHz and logarithmic after 1KHz
  12. 12. MFCC EXTRACTION: (CONTD..) Steps : FFT  Mel Filter  Log  DCT  CMS Mel Filter Bank Mel Filter : 12  Filtering of absolute fft coefficients using triangular filter bank in Mel scale MFCC gives distribution of energy acc. to filters in Mel frequency band
  13. 13. EXTRA FEATURES :ENERGY AND DELTAS For achieving high recognition rate A Energy Feature Delta and Delta-Delta  delta velocity feature Co-articulation  double delta acceleration feature
  14. 14. COMPOSITION OF FEATURE VECTOR 12 MFCC Features 12 Δ MFCC 12 Δ Δ MFCC 1 Energy Feature 1 Δ Energy 1 Δ Δ Energy 39 Features from each frame
  15. 15. Speech Recognition/Verification by HMM/VQ
  16. 16. HIDDEN MARKOV MODEL (HMM) HMM is the extension of Markov Process Markov Process consist of observable states HMM has hidden states and observable symbols per states HMM is the stochastic model
  17. 17. HMM (CONTD…) Parameters 1) The initial state distribution (π) 2) State transition probability distribution (A) 3) Observation symbol probability distribution (B) The HMM Model   (A,B, ) 
  18. 18. EXAMPLE:PRONUNCIATION MODEL OF WORD TOMATO   (A,B, )
  19. 19. HMM IMPLEMENTATION Feature Vector  observation symbols , 256 Phonemes hidden states, 6 Left to right HMM Discrete Hidden Markov Model (DHMM) with Vector Quantization (VQ) technique
  20. 20. SPEECH RECOGNITION SYSTEM
  21. 21. VECTOR QUANTIZATION
  22. 22. Speaker Recognition/Verification by GMM
  23. 23. SPEAKER VERIFICATION SYSTEM
  24. 24. SPEAKER MODELING (GMM) Gaussian Mixture Model  Parametric probability density function  Based on soft clustering technique  Mixture of Gaussian components   = ( , , )
  25. 25. SPEAKER MODEL TRAINING Estimate the model parameters Expectation Maximization algorithm
  26. 26. SPEAKER VERIFICATION Based on likelihood ratio ℎ ℎ ′ = ℎ ′
  27. 27. TOOLS USED Languages:  Adobe Flex  Java  Blaze DS for RPC Servers:  Apache Tomcat  MySQL Versioning  Tortoise SVN
  28. 28. OUTPUT : SNAPSHOT (GUI)
  29. 29. APPLICATION AREAS Telephone transaction  Telephone credit card purchase,  Telephone stock trading Access control  Physical facilities  Computer networks  Information retrieval  Customers information Forensics  Voice sample matching
  30. 30. LIMITATION AND FUTURE ENHANCEMENT Noise reduction Training on more data Combine with  other features  other classification methods
  31. 31. Thanks Any queries ?

×