SlideShare a Scribd company logo
Under guidance of
Dr. G. Pradhan
NIT PATNA (ECE dept.)
Presented by -
Kamlesh Kalvaniya -(1104080)
Niranjan Kumar –(1104087)
Piyush Kumar-(1104091)
B.TECH 4th yr (ECE dept.)
6/24/2015 N.I.T. PATNA ECE, DEPTT. 1
1. Introduction
2. Baseline speaker verification system
3. Future Plan
Speaker Recognition is the computing task of validating
identity claim of a person from his/her voice.
Applications:-
Authentication
Forensic test
Security system
ATM Security Key
Personalized user interface
Multi speaker tracking
Surveillance
6/24/2015 N.I.T. PATNA ECE, DEPTT. 3
Identification v/s verification
6/24/2015 N.I.T. PATNA ECE, DEPTT. 4
Phase of Speaker Verification
• Enrollment Session or Training Phase
• Operating Session or Testing Phase
6/24/2015 N.I.T. PATNA ECE, DEPTT. 5
Training & Testing Phase
Training Reference model
Speech
Identity claim
Testing
Speech R
Accept/reject
Pre-
processing
Feature
extraction
Model
Building
Pre-
processing
Feature
extraction comparison
Decision
logic
6/24/2015 N.I.T. PATNA ECE, DEPTT. 6
Preprocessing
Preprocessing is an important step in a speaker verification system. This also called
voice activity detection (VAD).
VAD separates speech region from non-speech regions[2-3]
It is very difficult to implement a VAD algorithm which works consistently for
different type of data
VAD algorithms can be classified in two groups
 Feature based approach
 Statistical model based approach
 Each of the VAD method have its own merits and demerits depending on accuracy,
complexity etc.
Due to simplicity most of the speaker verification systems use signal energy for VAD.
6/24/2015 N.I.T. PATNA ECE, DEPTT. 7
The speech signal along with speaker information
contains many other redundant information like
recording sensor, channel, environment etc.
The speaker specific information in the speech
signal[2]
 Unique speech production system
 Physiological
 Behavioral aspects
Feature extraction module transforms speech to a set
of feature vectors of reduce dimensions
 To enhance speaker specific information
 Suppress redundant information.
Feature Extraction
6/24/2015 N.I.T. PATNA ECE, DEPTT. 8
• Robust against noise and distortion
• Occur frequently and naturally in speech
• Be easy to measure from speech signal
• Be difficult to impersonate/mimic
• Not be affected by the speaker’s health or long term variations in voice
Selection of Features
6/24/2015 N.I.T. PATNA ECE, DEPTT. 9
Types Of Features
6/24/2015 N.I.T. PATNA ECE, DEPTT. 10
Feature Extraction Techniques
A wide range of approaches may be used to parametrically represent the speech
signal to be used in the speaker recognition activity.
 Linear Prediction Coding
 Linear Predictive Ceptral Coefficients
 Mel Frequency Ceptral Coefficients
 Perceptual Linear Prediction
 Neural Predictive Coding
Most of the state-of-the-art speaker verification systems use Mel-frequency
Cepstral Coefficient (MFCC) appended to it’s first and second order derivative
as the feature vectors
Easy to extract
Provides best performance compared to other features
 MFCC mostly contains information about the resonance structure of the vocal
tract system
6/24/2015 N.I.T. PATNA ECE, DEPTT. 11
1. Analog to digital conversion
2. Pre emphasis
3. Framing & windowing
4. Fast Fourier Transform
5. Mel scale wrapping
6. MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 12
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 13
Step 1:- Analog to digital conversion: is transformed to
digital form by sampling it at given frequency.
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 14
Step 2:- Pre-emphasis: The amount of energy present in
the high frequency (important for speech) are boosted.
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 15
Step 3:(framing)the signal is divided into frames
of given size.
MFCC FRAMING
6/24/2015 N.I.T. PATNA ECE, DEPTT. 16
MFCC FRAMING
6/24/2015 N.I.T. PATNA ECE, DEPTT. 17
MFCC FRAMING
6/24/2015 N.I.T. PATNA ECE, DEPTT. 18
MFCC FRAMING
6/24/2015 N.I.T. PATNA ECE, DEPTT. 19
25ms
10ms
MFCC WINDOWING
• The next step is to window individual frame to
minimize the signal discontinuities at the
beginning and end of each frame.
• The concept applied here is to minimize the
spectral distortion by using the window to
taper the signal to zero at the beginning and
end of each frame.
• We have used hamming window
6/24/2015 N.I.T. PATNA ECE, DEPTT. 20
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 21
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 22
MEL FILTERBANK
6/24/2015 N.I.T. PATNA ECE, DEPTT. 23
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 24
DCT
MFCC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 25
DCT
6/24/2015 N.I.T. PATNA ECE, DEPTT. 26
Speaker Modelling
• Vector Quantization
• Gaussian Mixture Model
• Gaussian Mixture Model-UBM
• Hidden Markov Model
• Artificial Neural Networks
• Super Vector Machines
• I-Vector
 Gaussian model assumes the feature vectors follow a Gaussian distribution,
characterized by mean vectors, covariance matrix and weights
 The data unseen in the training which appear in the test data will trigger a low
score
Speaker models the statistical information present in the
feature vectors it enhances the speaker information and
suppress the redundant information
 A Gaussian mixture density defined as-
A Gaussian function for D dimension is defined as-
where- Unimodal Gaussian
D=8,16,32,64
ʎ i = {wi , ∑i µi }
wi = Weight
µi = Mean ;
∑i = Covariance ;
i-No. of models(M=356)
6/24/2015
N.I.T. PATNA ECE, DEPTT.
27
Gaussian Mixture Model
 For a sequence of T training vector X={x1 , x2 ,…, xT }
the GMM likelihood can be defined as-
 For estimation of speaker specific GMM,
Expectation maximization algorithm is used .
6/24/2015 N.I.T. PATNA ECE, DEPTT. 28
6/24/2015 N.I.T. PATNA ECE, DEPTT. 29
ʎtarget : X(MFCC(TESTING DATA)) is from the hypothesized
speaker S
ʎUBM : X(MFCC(TESTING DATA)) is not from the
hypothesized speaker S
 The likelihood ratio test is given by-
LR(X)=
 The probability of alternative hypothesis
P(X/ʎUBM ) =F( P(X/ʎ1), P(X/ʎ2),..., P(X/ʎM))
F( ) is function such as average or maximum of likelihood
value of Background Speaker set ( P(X/ʎi) ) .
6/24/2015 N.I.T. PATNA ECE, DEPTT.
30
 Score Normalisation
Where-
s- Original Score = log(LR(X));
µI - Estimated mean of s
σI -standard deviation of s
6/24/2015 N.I.T. PATNA ECE, DEPTT. 31
PERFORMANCE EVALUATION
 NIST has conducted speaker recognition
benchmarking activity on annual basis since
1997.
NIST has provided speech files as development
data.
NIST 2003 data-
Testing Speech Data-2559
Train Speech Data-356
UBM Female Speech data-251
UBM male Speech data-251
6/24/2015 N.I.T. PATNA ECE, DEPTT. 32
For Baseline speaker verification the following parameter are
used
 VAD: Energy based VAD (0.6 * average energy)
 Feature vector: 13 dimension MFCC appended with delta
and delta-delta
 Modeling: GMM
 GMM size: 8, 16, 32, 64.0
 Comparison: log Likelihood score
.
6/24/2015 N.I.T. PATNA ECE, DEPTT. 34
DET
PLOT
FOR
TEST
15 Sec
AND
TRAIN
15
SEC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 35
DET
PLOT
FOR
TEST
FULL
AND
TRAIN
15
SEC
6/24/2015 N.I.T. PATNA ECE, DEPTT. 36
DET
PLOT
FOR
TEST
15 Sec
AND
TRAIN
FULL
6/24/2015 N.I.T. PATNA ECE, DEPTT. 37
DET
PLOT
FOR
TEST
FULL
AND
TRAIN
FULL
6/24/2015 N.I.T. PATNA ECE, DEPTT. 38
Comparison of training data model
with Equal Error Rate
.
6/24/2015 N.I.T. PATNA ECE, DEPTT. 39
GAUSSIAN SIZE
8
16
32
64
TEST 15 Sec
TRAIN 15 SEC
Test Full
Train 15 sec
TEST 15 sec
Train Full
Test Full
Train Full
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
34.90 34.24 33.18 27.70
33.05 32.28 30.50 25.67
32.46 32.94 28.78 23.67
32.82 33.06 27.42 22.05
Conclusion
 Performance is more sensitive to training
data.
6/24/2015 N.I.T. PATNA ECE, DEPTT. 40
Future Plan
 Synthetically generating training and testing speech
from limited speech data.
 Validating the results on state-of-the-art i-vector
based speaker verification system.
6/24/2015 N.I.T. PATNA ECE, DEPTT. 41
Thank you
6/24/2015 N.I.T. PATNA ECE, DEPTT. 42

More Related Content

What's hot

Introduction to multiple object tracking
Introduction to multiple object trackingIntroduction to multiple object tracking
Introduction to multiple object tracking
Fan Yang
 
Introduction to motion capture
Introduction to motion captureIntroduction to motion capture
Introduction to motion capture
Hanafikktmr
 
Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...
Priyanka Pradhan
 
BIOMETRICS FINGER PRINT TECHNOLOGY
BIOMETRICS FINGER PRINT TECHNOLOGYBIOMETRICS FINGER PRINT TECHNOLOGY
BIOMETRICS FINGER PRINT TECHNOLOGY
sathish sak
 
Forensic laboratory setup requirements
Forensic laboratory setup  requirements Forensic laboratory setup  requirements
Forensic laboratory setup requirements
Sonali Parab
 
Multimodal Biometric Systems
Multimodal Biometric SystemsMultimodal Biometric Systems
Multimodal Biometric SystemsPiyush Mittal
 
Forensic audio
Forensic audioForensic audio
Forensic audio
Tejasvi Bhatia
 
Forensic Audio and Video Analysis
Forensic Audio and Video AnalysisForensic Audio and Video Analysis
Forensic Audio and Video AnalysisJoulyn Kenny
 
Fingerprint recognition using minutiae based feature
Fingerprint recognition using minutiae based featureFingerprint recognition using minutiae based feature
Fingerprint recognition using minutiae based feature
varsha mohite
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication system
Nadeeshani Aththanagoda
 
Brain Finger Printing
Brain Finger PrintingBrain Finger Printing
Brain Finger PrintingGarima Singh
 
Distracted Driving
Distracted DrivingDistracted Driving
Forensics audio and video
Forensics   audio and videoForensics   audio and video
Forensics audio and video
UTD Computer Security Group
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsManish Khare
 
Criminal Identification - Full PPT (1) (1).pptx
Criminal Identification - Full PPT (1) (1).pptxCriminal Identification - Full PPT (1) (1).pptx
Criminal Identification - Full PPT (1) (1).pptx
MathanE5
 
A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...
Sebastiano Battiato
 
BRAIN FINGERPRINTING TECHNOLOGY
BRAIN FINGERPRINTING TECHNOLOGYBRAIN FINGERPRINTING TECHNOLOGY
BRAIN FINGERPRINTING TECHNOLOGY
Divyaprathapraju Divyaprathapraju
 
Criminal identification system
Criminal identification systemCriminal identification system
Criminal identification system
Akash Kumar Singh
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth
 
Multi modal biometric system
Multi modal biometric systemMulti modal biometric system
Multi modal biometric system
Aalaa Khattab
 

What's hot (20)

Introduction to multiple object tracking
Introduction to multiple object trackingIntroduction to multiple object tracking
Introduction to multiple object tracking
 
Introduction to motion capture
Introduction to motion captureIntroduction to motion capture
Introduction to motion capture
 
Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...Image Processing Based Signature Recognition and Verification Technique Using...
Image Processing Based Signature Recognition and Verification Technique Using...
 
BIOMETRICS FINGER PRINT TECHNOLOGY
BIOMETRICS FINGER PRINT TECHNOLOGYBIOMETRICS FINGER PRINT TECHNOLOGY
BIOMETRICS FINGER PRINT TECHNOLOGY
 
Forensic laboratory setup requirements
Forensic laboratory setup  requirements Forensic laboratory setup  requirements
Forensic laboratory setup requirements
 
Multimodal Biometric Systems
Multimodal Biometric SystemsMultimodal Biometric Systems
Multimodal Biometric Systems
 
Forensic audio
Forensic audioForensic audio
Forensic audio
 
Forensic Audio and Video Analysis
Forensic Audio and Video AnalysisForensic Audio and Video Analysis
Forensic Audio and Video Analysis
 
Fingerprint recognition using minutiae based feature
Fingerprint recognition using minutiae based featureFingerprint recognition using minutiae based feature
Fingerprint recognition using minutiae based feature
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication system
 
Brain Finger Printing
Brain Finger PrintingBrain Finger Printing
Brain Finger Printing
 
Distracted Driving
Distracted DrivingDistracted Driving
Distracted Driving
 
Forensics audio and video
Forensics   audio and videoForensics   audio and video
Forensics audio and video
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
 
Criminal Identification - Full PPT (1) (1).pptx
Criminal Identification - Full PPT (1) (1).pptxCriminal Identification - Full PPT (1) (1).pptx
Criminal Identification - Full PPT (1) (1).pptx
 
A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...
 
BRAIN FINGERPRINTING TECHNOLOGY
BRAIN FINGERPRINTING TECHNOLOGYBRAIN FINGERPRINTING TECHNOLOGY
BRAIN FINGERPRINTING TECHNOLOGY
 
Criminal identification system
Criminal identification systemCriminal identification system
Criminal identification system
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Multi modal biometric system
Multi modal biometric systemMulti modal biometric system
Multi modal biometric system
 

Similar to Speaker Identification and Verification

VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...niranjan kumar
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
IJERA Editor
 
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
hanumayamma
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
IRJET Journal
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
niranjan kumar
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
CSCJournals
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
IJCSIS Research Publications
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix Factorization
IRJET Journal
 
D04812125
D04812125D04812125
D04812125
IOSR-JEN
 
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Ahmed Ayman
 
Ijecet 06 09_010
Ijecet 06 09_010Ijecet 06 09_010
Ijecet 06 09_010
IAEME Publication
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
Deepesh Lekhak
 
Atw segments 1 & 2 final 6 6-09
Atw segments 1 & 2 final 6 6-09Atw segments 1 & 2 final 6 6-09
Atw segments 1 & 2 final 6 6-09
Robert L. Ussery
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET Journal
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Sri Manakula Vinayagar Engineering College
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
IJECEIAES
 
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
Mark John Lado, MIT
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
eSAT Journals
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
adil raja
 

Similar to Speaker Identification and Verification (20)

VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
2018 IEEE Big Data Cup Challenge - FEMH ​Voice Data Challenge
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix Factorization
 
D04812125
D04812125D04812125
D04812125
 
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
 
Ijecet 06 09_010
Ijecet 06 09_010Ijecet 06 09_010
Ijecet 06 09_010
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 
Atw segments 1 & 2 final 6 6-09
Atw segments 1 & 2 final 6 6-09Atw segments 1 & 2 final 6 6-09
Atw segments 1 & 2 final 6 6-09
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
 

Recently uploaded

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 

Recently uploaded (20)

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 

Speaker Identification and Verification

  • 1. Under guidance of Dr. G. Pradhan NIT PATNA (ECE dept.) Presented by - Kamlesh Kalvaniya -(1104080) Niranjan Kumar –(1104087) Piyush Kumar-(1104091) B.TECH 4th yr (ECE dept.) 6/24/2015 N.I.T. PATNA ECE, DEPTT. 1
  • 2. 1. Introduction 2. Baseline speaker verification system 3. Future Plan
  • 3. Speaker Recognition is the computing task of validating identity claim of a person from his/her voice. Applications:- Authentication Forensic test Security system ATM Security Key Personalized user interface Multi speaker tracking Surveillance 6/24/2015 N.I.T. PATNA ECE, DEPTT. 3
  • 4. Identification v/s verification 6/24/2015 N.I.T. PATNA ECE, DEPTT. 4
  • 5. Phase of Speaker Verification • Enrollment Session or Training Phase • Operating Session or Testing Phase 6/24/2015 N.I.T. PATNA ECE, DEPTT. 5
  • 6. Training & Testing Phase Training Reference model Speech Identity claim Testing Speech R Accept/reject Pre- processing Feature extraction Model Building Pre- processing Feature extraction comparison Decision logic 6/24/2015 N.I.T. PATNA ECE, DEPTT. 6
  • 7. Preprocessing Preprocessing is an important step in a speaker verification system. This also called voice activity detection (VAD). VAD separates speech region from non-speech regions[2-3] It is very difficult to implement a VAD algorithm which works consistently for different type of data VAD algorithms can be classified in two groups  Feature based approach  Statistical model based approach  Each of the VAD method have its own merits and demerits depending on accuracy, complexity etc. Due to simplicity most of the speaker verification systems use signal energy for VAD. 6/24/2015 N.I.T. PATNA ECE, DEPTT. 7
  • 8. The speech signal along with speaker information contains many other redundant information like recording sensor, channel, environment etc. The speaker specific information in the speech signal[2]  Unique speech production system  Physiological  Behavioral aspects Feature extraction module transforms speech to a set of feature vectors of reduce dimensions  To enhance speaker specific information  Suppress redundant information. Feature Extraction 6/24/2015 N.I.T. PATNA ECE, DEPTT. 8
  • 9. • Robust against noise and distortion • Occur frequently and naturally in speech • Be easy to measure from speech signal • Be difficult to impersonate/mimic • Not be affected by the speaker’s health or long term variations in voice Selection of Features 6/24/2015 N.I.T. PATNA ECE, DEPTT. 9
  • 10. Types Of Features 6/24/2015 N.I.T. PATNA ECE, DEPTT. 10
  • 11. Feature Extraction Techniques A wide range of approaches may be used to parametrically represent the speech signal to be used in the speaker recognition activity.  Linear Prediction Coding  Linear Predictive Ceptral Coefficients  Mel Frequency Ceptral Coefficients  Perceptual Linear Prediction  Neural Predictive Coding Most of the state-of-the-art speaker verification systems use Mel-frequency Cepstral Coefficient (MFCC) appended to it’s first and second order derivative as the feature vectors Easy to extract Provides best performance compared to other features  MFCC mostly contains information about the resonance structure of the vocal tract system 6/24/2015 N.I.T. PATNA ECE, DEPTT. 11
  • 12. 1. Analog to digital conversion 2. Pre emphasis 3. Framing & windowing 4. Fast Fourier Transform 5. Mel scale wrapping 6. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 12
  • 13. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 13 Step 1:- Analog to digital conversion: is transformed to digital form by sampling it at given frequency.
  • 14. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 14 Step 2:- Pre-emphasis: The amount of energy present in the high frequency (important for speech) are boosted.
  • 15. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 15 Step 3:(framing)the signal is divided into frames of given size.
  • 16. MFCC FRAMING 6/24/2015 N.I.T. PATNA ECE, DEPTT. 16
  • 17. MFCC FRAMING 6/24/2015 N.I.T. PATNA ECE, DEPTT. 17
  • 18. MFCC FRAMING 6/24/2015 N.I.T. PATNA ECE, DEPTT. 18
  • 19. MFCC FRAMING 6/24/2015 N.I.T. PATNA ECE, DEPTT. 19 25ms 10ms
  • 20. MFCC WINDOWING • The next step is to window individual frame to minimize the signal discontinuities at the beginning and end of each frame. • The concept applied here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. • We have used hamming window 6/24/2015 N.I.T. PATNA ECE, DEPTT. 20
  • 21. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 21
  • 22. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 22
  • 23. MEL FILTERBANK 6/24/2015 N.I.T. PATNA ECE, DEPTT. 23
  • 24. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 24 DCT
  • 25. MFCC 6/24/2015 N.I.T. PATNA ECE, DEPTT. 25 DCT
  • 26. 6/24/2015 N.I.T. PATNA ECE, DEPTT. 26 Speaker Modelling • Vector Quantization • Gaussian Mixture Model • Gaussian Mixture Model-UBM • Hidden Markov Model • Artificial Neural Networks • Super Vector Machines • I-Vector  Gaussian model assumes the feature vectors follow a Gaussian distribution, characterized by mean vectors, covariance matrix and weights  The data unseen in the training which appear in the test data will trigger a low score Speaker models the statistical information present in the feature vectors it enhances the speaker information and suppress the redundant information
  • 27.  A Gaussian mixture density defined as- A Gaussian function for D dimension is defined as- where- Unimodal Gaussian D=8,16,32,64 ʎ i = {wi , ∑i µi } wi = Weight µi = Mean ; ∑i = Covariance ; i-No. of models(M=356) 6/24/2015 N.I.T. PATNA ECE, DEPTT. 27 Gaussian Mixture Model
  • 28.  For a sequence of T training vector X={x1 , x2 ,…, xT } the GMM likelihood can be defined as-  For estimation of speaker specific GMM, Expectation maximization algorithm is used . 6/24/2015 N.I.T. PATNA ECE, DEPTT. 28
  • 29. 6/24/2015 N.I.T. PATNA ECE, DEPTT. 29
  • 30. ʎtarget : X(MFCC(TESTING DATA)) is from the hypothesized speaker S ʎUBM : X(MFCC(TESTING DATA)) is not from the hypothesized speaker S  The likelihood ratio test is given by- LR(X)=  The probability of alternative hypothesis P(X/ʎUBM ) =F( P(X/ʎ1), P(X/ʎ2),..., P(X/ʎM)) F( ) is function such as average or maximum of likelihood value of Background Speaker set ( P(X/ʎi) ) . 6/24/2015 N.I.T. PATNA ECE, DEPTT. 30
  • 31.  Score Normalisation Where- s- Original Score = log(LR(X)); µI - Estimated mean of s σI -standard deviation of s 6/24/2015 N.I.T. PATNA ECE, DEPTT. 31
  • 32. PERFORMANCE EVALUATION  NIST has conducted speaker recognition benchmarking activity on annual basis since 1997. NIST has provided speech files as development data. NIST 2003 data- Testing Speech Data-2559 Train Speech Data-356 UBM Female Speech data-251 UBM male Speech data-251 6/24/2015 N.I.T. PATNA ECE, DEPTT. 32
  • 33. For Baseline speaker verification the following parameter are used  VAD: Energy based VAD (0.6 * average energy)  Feature vector: 13 dimension MFCC appended with delta and delta-delta  Modeling: GMM  GMM size: 8, 16, 32, 64.0  Comparison: log Likelihood score
  • 34. . 6/24/2015 N.I.T. PATNA ECE, DEPTT. 34
  • 39. Comparison of training data model with Equal Error Rate . 6/24/2015 N.I.T. PATNA ECE, DEPTT. 39 GAUSSIAN SIZE 8 16 32 64 TEST 15 Sec TRAIN 15 SEC Test Full Train 15 sec TEST 15 sec Train Full Test Full Train Full EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) 34.90 34.24 33.18 27.70 33.05 32.28 30.50 25.67 32.46 32.94 28.78 23.67 32.82 33.06 27.42 22.05
  • 40. Conclusion  Performance is more sensitive to training data. 6/24/2015 N.I.T. PATNA ECE, DEPTT. 40
  • 41. Future Plan  Synthetically generating training and testing speech from limited speech data.  Validating the results on state-of-the-art i-vector based speaker verification system. 6/24/2015 N.I.T. PATNA ECE, DEPTT. 41
  • 42. Thank you 6/24/2015 N.I.T. PATNA ECE, DEPTT. 42