SlideShare a Scribd company logo
1 of 19
SPEAKER RECOGNITION USING MFCC
• Hira Shaukat 2010131
 DSP Lab Project

 Matlab-based programming

• Attiya Rehman 2010079
HUMAN SPEECH
•

The human speech contains numerous
discriminative features that can be used to
identify speakers.

•

Speech contains significant energy from
zero frequency up to around 5 kHz.

•

Objective of automatic speaker recognition
is to extract, characterize and recognize
the information about speaker identity.

•

The property of speech signal changes
markedly as a function of time.
SPEECH DISCERNMENT
•

Speaker recognition systems
contain two main modules:

•
•
•

feature extraction
feature matching

Feature extraction:
Extract a small amount of data
from the voice signal that can
be used to represent each
speaker

•

Feature matching:
Procedure to identify the
unknown
speaker
by
comparing extracted features
from his/her voice input with
the ones from a set of known
speakers
Phase 1

SPEECH FEATURE EXTRACTION
INTRODUCTION
•

Speech signal - slowly timed
varying signal (it is called quasistationary)

•

Signal-processing front end
Conversion of speech
waveform, using digital signal
processing (DSP) tools, to a set of
features (at a considerably lower
information rate) for further
analysis

•

Short-time spectral analysis
Characterization of the speech
signal
CHOICE OF METHODOLOGY
•

Linear Predictive Coding (LPC)

•

Mel-Frequency Cepstrum
Coefficients (MFCC)

•

Perceptual Linear Predictive
Analysis (PLP)
BASIC METHODOLOGY

MFCC – MEL FREQUENCY CEPSTRUM COEFFICIENT
MEL FREQUENCY CEPSTRUM CO-EFFICIENT (MFCC)
Main purpose of the MFCC processor is to mimic the behavior of the human ears
MFFC‟s - less susceptible to variations
Speech input typically recorded at a sampling rate above 10000 Hz
This sampling frequency chosen to minimize the effects of aliasing in the analog-to-digital
conversion
• Sampled signals can capture all frequencies up to 5 kHz, which cover most energy of
sounds that are generated by humans
•
•
•
•
FRAME BLOCKING
•

In this step the continuous speech signal is
blocked into frames of N samples, with
adjacent frames being separated by M (M <
N)

•

The first frame consists of the first N samples

•

The second frame begins M samples after
the first frame, and overlaps it by N - M
samples

•

Process continues until all the speech is
accounted for within one or more frames

•

Taken values for N and M are N = 256 and M
= 100

•

The result after this step is referred to as
spectrum or periodogram.
MEL-FREQUENCY WRAPPING
•

For each tone with an actual frequency, f, a
subjective pitch is measured on a scale called the
„mel‟ scale.

•

Mel-frequency scale is a linear frequency spacing
below 1000 Hz and a logarithmic spacing above
1000 Hz

•

One approach to simulating the subjective
spectrum is to use a filter bank, spaced uniformly
on the mel-scale
MEL-FREQUENCY WRAPPING
•

That filter bank has a triangular bandpass
frequency response, and the spacing as well as
the bandwidth is determined by a constant mel
frequency interval

•

The number of mel spectrum coefficients, K, is
typically chosen as 20.

•

Filter bank is applied in the frequency domain
CEPSTRUM
•

In this final step, we convert the log mel spectrum back to time

•

Result - mel frequency cepstrum coefficients (MFCC)

•

Cepstral representation of speech spectrum provides good representation of the local
spectral properties of the signal for the given frame analysis

•

Mel spectrum coefficients (and so their logarithm) are real numbers
can be converted to the time domain using the Discrete Cosine Transform (DCT)

•

We exclude the first component, from the DCT since it represents the mean value of the
input signal, which carries little speaker specific information
Phase 2

SPEECH FEATURE MATCHING
FEATURE MATCHING
•

Comes under pattern recognition (The objects of interest are generically called patterns)

•

Patterns - sequences of acoustic vectors that are extracted from an input speech using
extraction

•

Test Set - Patterns used to test the classification algorithm

•

Feature matching techniques used in speaker recognition - Dynamic Time Warping
(DTW), Hidden Markov Modeling (HMM), and Vector Quantization (VQ)

•

VQ approach used due to:
• ease of implementation
• high accuracy
VQ is a process of mapping vectors from a large vector space to a finite number of
regions in that space.

•
•

Each region is called a cluster and can be represented by its center called a codeword.
The collection of all codewords is called a codebook.
VECTOR QUANTIZATION CODE-BOOK FORMATION
•

•
•
•

Two speakers and two dimensions
of the acoustic space
Circles - acoustic vectors from the
speaker 1
Triangles - acoustic vectors from
speaker 2
Training phase
• using the clustering algorithm
a
speaker-specific
VQ
codebook is generated for
each known speaker by
clustering his/her training
acoustic vectors
• Result codewords (centroids) black circles and black
triangles for speaker 1 and 2
VECTOR QUANTIZATION CODE-BOOK FORMATION
•
•

•

Distance from a vector to the
closest codeword of a codebook is
called VQ-distortion
Input utterance of an unknown
voice is “vector-quantized” using
each trained codebook and the
total VQ distortion is computed
The speaker corresponding to the
VQ codebook with smallest total
distortion is identified as the
speaker of the input utterance.
CLUSTERING THE TRAINING VECTORS
Phase 3

MATLAB CODING
L.R. Rabiner and B.H. Juang, Fundamentals of Speech
Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.



L.R Rabiner and R.W. Schafer, Digital Processing of Speech
Signals, Prentice-Hall, Englewood Cliffs, N.J., 1978.



REFERENCES



S.B. Davis and P. Mermelstein, “Comparison of parametric
representations for monosyllabic word recognition in continuously spoken
sentences”, IEEE Transactions on Acoustics, Speech, Signal
Processing, Vol. ASSP-28, No. 4, August 1980.



Y. Linde, A. Buzo & R. Gray, “An algorithm for vector quantizer
design”, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.



S. Furui, “Speaker independent isolated word recognition using
dynamic features of speech spectrum”, IEEE Transactions on
Acoustic, Speech, Signal Processing, Vol. ASSP-34, No. 1, pp. 5259, February 1986.
S. Furui, “An overview of speaker recognition technology”, ESCA
Workshop on Automatic Speaker Recognition, Identification and
Verification, pp. 1-9, 1994.





F.K. Song, A.E. Rosenberg and B.H. Juang, “A vector
quantisation approach to speaker recognition”, AT&T Technical
Journal, Vol. 66-2, pp. 14-26, March 1987.



comp.speech Frequently Asked Questions WWW site,


http://svr-www.eng.cam.ac.uk/comp.speech/

More Related Content

What's hot

Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal ProcessingMurtadha Alsabbagh
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping pptciciapaul
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basicssivakumar m
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processingazhagujaisudhan
 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using WaveletAsim Qureshi
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
3rd sem ppt for wavelet
3rd sem ppt for wavelet3rd sem ppt for wavelet
3rd sem ppt for waveletgandimare
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATIONniranjan kumar
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and InterpolationFernando Ojeda
 
Image Filtering in the Frequency Domain
Image Filtering in the Frequency DomainImage Filtering in the Frequency Domain
Image Filtering in the Frequency DomainAmnaakhaan
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognitionsaniya shaikh
 

What's hot (20)

Speech processing
Speech processingSpeech processing
Speech processing
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping ppt
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using Wavelet
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
3rd sem ppt for wavelet
3rd sem ppt for wavelet3rd sem ppt for wavelet
3rd sem ppt for wavelet
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Wavelet Transform and DSP Applications
Wavelet Transform and DSP ApplicationsWavelet Transform and DSP Applications
Wavelet Transform and DSP Applications
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and Interpolation
 
GMM
GMMGMM
GMM
 
Image Filtering in the Frequency Domain
Image Filtering in the Frequency DomainImage Filtering in the Frequency Domain
Image Filtering in the Frequency Domain
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognition
 

Similar to Speaker recognition using MFCC

44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesisNAVER Engineering
 
Harmonic speech coding
Harmonic speech codingHarmonic speech coding
Harmonic speech codingMuthanaALJANABI
 
Introduction to spred spectrum and CDMA
Introduction to spred spectrum and CDMAIntroduction to spred spectrum and CDMA
Introduction to spred spectrum and CDMABidhan Ghimire
 
P141omfccu
P141omfccuP141omfccu
P141omfcculodhabhavik
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition finalArchit Vora
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive CodingSrishti Kakade
 
Design and Implementation of Speech Based Scientific Calculator
Design and Implementation of Speech Based Scientific CalculatorDesign and Implementation of Speech Based Scientific Calculator
Design and Implementation of Speech Based Scientific CalculatorShantha Suresh M
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Wireless SS.pptx
Wireless                                        SS.pptxWireless                                        SS.pptx
Wireless SS.pptxDesalechali1
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
 

Similar to Speaker recognition using MFCC (20)

44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
Harmonic speech coding
Harmonic speech codingHarmonic speech coding
Harmonic speech coding
 
Introduction to spred spectrum and CDMA
Introduction to spred spectrum and CDMAIntroduction to spred spectrum and CDMA
Introduction to spred spectrum and CDMA
 
P141omfccu
P141omfccuP141omfccu
P141omfccu
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Sampling
SamplingSampling
Sampling
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
unit 5 ADC.pptx
unit 5 ADC.pptxunit 5 ADC.pptx
unit 5 ADC.pptx
 
Design and Implementation of Speech Based Scientific Calculator
Design and Implementation of Speech Based Scientific CalculatorDesign and Implementation of Speech Based Scientific Calculator
Design and Implementation of Speech Based Scientific Calculator
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Wireless SS.pptx
Wireless                                        SS.pptxWireless                                        SS.pptx
Wireless SS.pptx
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 

More from Hira Shaukat

4 bit counter
4 bit counter4 bit counter
4 bit counterHira Shaukat
 
Mobility Management
Mobility ManagementMobility Management
Mobility ManagementHira Shaukat
 
Development of Islamabad through SME
Development of Islamabad through SME Development of Islamabad through SME
Development of Islamabad through SME Hira Shaukat
 
Future Cooperative Networks
Future Cooperative NetworksFuture Cooperative Networks
Future Cooperative NetworksHira Shaukat
 
Spread spectrum communication schemes
Spread spectrum communication schemesSpread spectrum communication schemes
Spread spectrum communication schemesHira Shaukat
 
Home automation system
Home automation system Home automation system
Home automation system Hira Shaukat
 
Cruise control simulation using matlab
Cruise control simulation using matlabCruise control simulation using matlab
Cruise control simulation using matlabHira Shaukat
 

More from Hira Shaukat (8)

4 bit counter
4 bit counter4 bit counter
4 bit counter
 
Mobility Management
Mobility ManagementMobility Management
Mobility Management
 
Development of Islamabad through SME
Development of Islamabad through SME Development of Islamabad through SME
Development of Islamabad through SME
 
Future Cooperative Networks
Future Cooperative NetworksFuture Cooperative Networks
Future Cooperative Networks
 
Spread spectrum communication schemes
Spread spectrum communication schemesSpread spectrum communication schemes
Spread spectrum communication schemes
 
3 d printer
3 d printer3 d printer
3 d printer
 
Home automation system
Home automation system Home automation system
Home automation system
 
Cruise control simulation using matlab
Cruise control simulation using matlabCruise control simulation using matlab
Cruise control simulation using matlab
 

Recently uploaded

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Speaker recognition using MFCC

  • 1. SPEAKER RECOGNITION USING MFCC • Hira Shaukat 2010131  DSP Lab Project  Matlab-based programming • Attiya Rehman 2010079
  • 2. HUMAN SPEECH • The human speech contains numerous discriminative features that can be used to identify speakers. • Speech contains significant energy from zero frequency up to around 5 kHz. • Objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. • The property of speech signal changes markedly as a function of time.
  • 3. SPEECH DISCERNMENT • Speaker recognition systems contain two main modules: • • • feature extraction feature matching Feature extraction: Extract a small amount of data from the voice signal that can be used to represent each speaker • Feature matching: Procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers
  • 5. INTRODUCTION • Speech signal - slowly timed varying signal (it is called quasistationary) • Signal-processing front end Conversion of speech waveform, using digital signal processing (DSP) tools, to a set of features (at a considerably lower information rate) for further analysis • Short-time spectral analysis Characterization of the speech signal
  • 6. CHOICE OF METHODOLOGY • Linear Predictive Coding (LPC) • Mel-Frequency Cepstrum Coefficients (MFCC) • Perceptual Linear Predictive Analysis (PLP)
  • 7. BASIC METHODOLOGY MFCC – MEL FREQUENCY CEPSTRUM COEFFICIENT
  • 8. MEL FREQUENCY CEPSTRUM CO-EFFICIENT (MFCC) Main purpose of the MFCC processor is to mimic the behavior of the human ears MFFC‟s - less susceptible to variations Speech input typically recorded at a sampling rate above 10000 Hz This sampling frequency chosen to minimize the effects of aliasing in the analog-to-digital conversion • Sampled signals can capture all frequencies up to 5 kHz, which cover most energy of sounds that are generated by humans • • • •
  • 9. FRAME BLOCKING • In this step the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N) • The first frame consists of the first N samples • The second frame begins M samples after the first frame, and overlaps it by N - M samples • Process continues until all the speech is accounted for within one or more frames • Taken values for N and M are N = 256 and M = 100 • The result after this step is referred to as spectrum or periodogram.
  • 10. MEL-FREQUENCY WRAPPING • For each tone with an actual frequency, f, a subjective pitch is measured on a scale called the „mel‟ scale. • Mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz • One approach to simulating the subjective spectrum is to use a filter bank, spaced uniformly on the mel-scale
  • 11. MEL-FREQUENCY WRAPPING • That filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a constant mel frequency interval • The number of mel spectrum coefficients, K, is typically chosen as 20. • Filter bank is applied in the frequency domain
  • 12. CEPSTRUM • In this final step, we convert the log mel spectrum back to time • Result - mel frequency cepstrum coefficients (MFCC) • Cepstral representation of speech spectrum provides good representation of the local spectral properties of the signal for the given frame analysis • Mel spectrum coefficients (and so their logarithm) are real numbers can be converted to the time domain using the Discrete Cosine Transform (DCT) • We exclude the first component, from the DCT since it represents the mean value of the input signal, which carries little speaker specific information
  • 14. FEATURE MATCHING • Comes under pattern recognition (The objects of interest are generically called patterns) • Patterns - sequences of acoustic vectors that are extracted from an input speech using extraction • Test Set - Patterns used to test the classification algorithm • Feature matching techniques used in speaker recognition - Dynamic Time Warping (DTW), Hidden Markov Modeling (HMM), and Vector Quantization (VQ) • VQ approach used due to: • ease of implementation • high accuracy VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. • • Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a codebook.
  • 15. VECTOR QUANTIZATION CODE-BOOK FORMATION • • • • Two speakers and two dimensions of the acoustic space Circles - acoustic vectors from the speaker 1 Triangles - acoustic vectors from speaker 2 Training phase • using the clustering algorithm a speaker-specific VQ codebook is generated for each known speaker by clustering his/her training acoustic vectors • Result codewords (centroids) black circles and black triangles for speaker 1 and 2
  • 16. VECTOR QUANTIZATION CODE-BOOK FORMATION • • • Distance from a vector to the closest codeword of a codebook is called VQ-distortion Input utterance of an unknown voice is “vector-quantized” using each trained codebook and the total VQ distortion is computed The speaker corresponding to the VQ codebook with smallest total distortion is identified as the speaker of the input utterance.
  • 19. L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.  L.R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, N.J., 1978.  REFERENCES  S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-28, No. 4, August 1980.  Y. Linde, A. Buzo & R. Gray, “An algorithm for vector quantizer design”, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.  S. Furui, “Speaker independent isolated word recognition using dynamic features of speech spectrum”, IEEE Transactions on Acoustic, Speech, Signal Processing, Vol. ASSP-34, No. 1, pp. 5259, February 1986. S. Furui, “An overview of speaker recognition technology”, ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 1-9, 1994.   F.K. Song, A.E. Rosenberg and B.H. Juang, “A vector quantisation approach to speaker recognition”, AT&T Technical Journal, Vol. 66-2, pp. 14-26, March 1987.  comp.speech Frequently Asked Questions WWW site,  http://svr-www.eng.cam.ac.uk/comp.speech/

Editor's Notes

  1. An example of speech signal is shown in Figure 2. When examined over a sufficiently short period of time (between 5 and 100 msec), its characteristics are fairly stationary. However, over long periods of time (on the order of 1/5 seconds or more) the signal characteristic change to reflect the different speech sounds being spoken. Therefore, short-time spectral analysis is the most common way to characterize the speech signal.
  2. An example of speech signal is shown in Figure 2. When examined over a sufficiently short period of time (between 5 and 100 msec), its characteristics are fairly stationary. However, over long periods of time (on the order of 1/5 seconds or more) the signal characteristic change to reflect the different speech sounds being spoken. Therefore, short-time spectral analysis is the most common way to characterize the speech signal.
  3. MFCC’s are based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies have been used to capture the phonetically important characteristics of speech. This is expressed in the mel-frequency scale, which is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The process of computing MFCCs is described in more detail next.
  4. As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. A useful way of thinking about this mel-wrapping filter bank is to view each filter as a histogram bin (where bins have overlap) in the frequency domain.
  5. As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. A useful way of thinking about this mel-wrapping filter bank is to view each filter as a histogram bin (where bins have overlap) in the frequency domain.