SlideShare a Scribd company logo
Mel Frequency Cepstral
Coefficient (MFCC)
Linear Prediction Coefficients (LPCs)
and Linear Prediction Cepstral
Coefficients (LPCCs) were the main
feature type for automatic speech
recognition (ASR), especially
with HMM classifiers.
Prior to the
introduction of
MFCCs
Mel-Frequency Cepstral Coefficients
(MFCCs) were very popular features
for a long time; but more recently,
filter banks are becoming
increasingly popular
MFCCs were very useful with Gaussian
Mixture Models - Hidden Markov
Models (GMMs-HMMs), MFCCs and
GMMs-HMMs co-evolved to be the
standard way of doing Automatic
Speech Recognition (ASR).
It turns out that filter bank
coefficients are highly correlated,
which could be problematic in some
machine learning algorithms.
The job of MFCCs is to
accurately represent this
envelope of the short
time power spectrum.
Frame the signal into short frames
For each frame calculate the periodogram estimate of the power
spectrum
Apply the mel filterbank to the power spectra and sum the
energy in each filter
Take the logarithm of all filterbank energies
Take the DCT of the log filterbank energies.
Keep DCT coefficients 2-13, discard the rest
Mel scale
The formula for converting
from frequency to Mel scale
is:
To go from Mels back to
frequency:
The Mel scale relates perceived frequency, or pitch, of
a pure tone to its actual measured frequency. Humans
are much better at discerning small changes in pitch at
low frequencies than they are at high frequencies.
Incorporating this scale makes our features match
more closely what humans hear.
An audio signal is constantly changing, so to simplify things
assume it on short time scales i.e. statistically stationary
but the samples are constantly changing. Due to this the
signal is divided into frames. If the frame is much shorter,
we don't have enough samples to get a reliable spectral
estimate, if it is longer the signal changes too much
throughout the frame.
The next step is to calculate the power spectrum of each
frame. This is motivated by the human cochlea (an organ in
the ear) which vibrates at different spots depending on the
frequency of the incoming sounds. Depending on the
location in the cochlea that vibrates (which wobbles small
hairs), different nerves fire informing the brain that certain
frequencies are present. Our periodogram estimate performs
a similar job for us, identifying which frequencies are present
in the frame.
The periodogram spectral estimate still contains a lot of information not required for Automatic
Speech Recognition (ASR). The cochlea can not discern the difference between two closely spaced
frequencies. This effect becomes more pronounced as the frequencies increase. For this reason
we take clumps of periodogram bins and sum them up to get an idea of how much energy exists
in various frequency regions. This is performed by our Mel filter bank: the first filter is very
narrow and gives an indication of how much energy exists near 0 Hertz. As the frequencies get
higher our filters get wider as we become less concerned about variations. We are only interested
in roughly how much energy occurs at each spot. The Mel scale tells us exactly how to space our
filter banks and how wide to make them.
Once we have the filter bank energies, we take the logarithm of
them. This is also motivated by human hearing: we don't hear
loudness on a linear scale. Generally to double the perceived
volume of a sound we need to put 8 times as much energy into it.
This means that large variations in energy may not sound all that
different if the sound is loud to begin with. This compression
operation makes our features match more closely what humans
hear.
The final step is to compute the DCT of the log filter bank
energies. There are 2 main reasons this is performed. Because
our filter banks are all overlapping, the filter bank energies are
quite correlated with each other. The DCT decorrelates the
energies which means diagonal covariance matrices can be
used to model the features in e.g. a HMM classifier. But the
higher DCT coefficients represent fast changes in the filter
bank energies and it turns out that these fast changes degrade
ASR performance, so we get a small improvement by dropping
them.

More Related Content

What's hot

Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplier
Sudhir Kumar
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
Amr E. Mohamed
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
A. Shamel
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
Vinodhini
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
sivakumar m
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
Murtadha Alsabbagh
 
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
Brati Sundar Nanda
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
Hemaraja Nayaka S
 
ADAPTIVE NOISE CANCELLATION
ADAPTIVE NOISE CANCELLATIONADAPTIVE NOISE CANCELLATION
ADAPTIVE NOISE CANCELLATION
SREENIVASA ARUN KUMAR
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
anithabalaprabhu
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
sivakumar m
 
Comparison of Amplitude Modulation Techniques.pptx
Comparison of Amplitude Modulation Techniques.pptxComparison of Amplitude Modulation Techniques.pptx
Comparison of Amplitude Modulation Techniques.pptx
ArunChokkalingam
 
Pulse Code Modulation (PCM)
Pulse Code Modulation (PCM)Pulse Code Modulation (PCM)
Pulse Code Modulation (PCM)
Arun c
 
Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)
Omkar Rane
 
Pulse amplitude modulation
Pulse amplitude modulationPulse amplitude modulation
Pulse amplitude modulation
Vishal kakade
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
chakravarthy Gopi
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
Rajat Kumar
 
Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms
Mohammed Abuibaid
 
Eye diagram
Eye diagramEye diagram
Eye diagram
srkrishna341
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimation
junjer
 

What's hot (20)

Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplier
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
ADAPTIVE NOISE CANCELLATION
ADAPTIVE NOISE CANCELLATIONADAPTIVE NOISE CANCELLATION
ADAPTIVE NOISE CANCELLATION
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Comparison of Amplitude Modulation Techniques.pptx
Comparison of Amplitude Modulation Techniques.pptxComparison of Amplitude Modulation Techniques.pptx
Comparison of Amplitude Modulation Techniques.pptx
 
Pulse Code Modulation (PCM)
Pulse Code Modulation (PCM)Pulse Code Modulation (PCM)
Pulse Code Modulation (PCM)
 
Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)
 
Pulse amplitude modulation
Pulse amplitude modulationPulse amplitude modulation
Pulse amplitude modulation
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms
 
Eye diagram
Eye diagramEye diagram
Eye diagram
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimation
 

Similar to Mel frequency cepstral coefficient (mfcc)

Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
ijsrd.com
 
C4_S2_G8 (1).pdf
C4_S2_G8  (1).pdfC4_S2_G8  (1).pdf
C4_S2_G8 (1).pdf
AbdelrhmanTarek12
 
C4_S2_G8 .pdf
C4_S2_G8 .pdfC4_S2_G8 .pdf
C4_S2_G8 .pdf
AbdelrhmanTarek12
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
ijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
ijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
ijma
 
Hl3413921395
Hl3413921395Hl3413921395
Hl3413921395
IJERA Editor
 
Radiographic exposure and image quality
Radiographic exposure and image qualityRadiographic exposure and image quality
Radiographic exposure and image quality
Rad Tech
 
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
ijwmn
 
Investigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM systemInvestigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM system
IOSR Journals
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
CSCJournals
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
 
Evaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat labEvaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat lab
Sikkim Manipal Institute Of Technology
 
Dl35622627
Dl35622627Dl35622627
Dl35622627
IJERA Editor
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
ijsc
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
editor1knowledgecuddle
 
NON PARAMETRIC METHOD
NON PARAMETRIC METHODNON PARAMETRIC METHOD
NON PARAMETRIC METHOD
RUPAK KUMAR GUPTA
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
IOSR Journals
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiac
Raja Ram
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Roman Atachiants
 

Similar to Mel frequency cepstral coefficient (mfcc) (20)

Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
C4_S2_G8 (1).pdf
C4_S2_G8  (1).pdfC4_S2_G8  (1).pdf
C4_S2_G8 (1).pdf
 
C4_S2_G8 .pdf
C4_S2_G8 .pdfC4_S2_G8 .pdf
C4_S2_G8 .pdf
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Hl3413921395
Hl3413921395Hl3413921395
Hl3413921395
 
Radiographic exposure and image quality
Radiographic exposure and image qualityRadiographic exposure and image quality
Radiographic exposure and image quality
 
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
 
Investigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM systemInvestigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM system
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Evaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat labEvaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat lab
 
Dl35622627
Dl35622627Dl35622627
Dl35622627
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
NON PARAMETRIC METHOD
NON PARAMETRIC METHODNON PARAMETRIC METHOD
NON PARAMETRIC METHOD
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiac
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
 

More from BushraShaikh44

Robort
RobortRobort
Prefix suffix
Prefix suffixPrefix suffix
Prefix suffix
BushraShaikh44
 
Additive softmax
Additive softmaxAdditive softmax
Additive softmax
BushraShaikh44
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
BushraShaikh44
 
Bionomial nomanclature
Bionomial nomanclatureBionomial nomanclature
Bionomial nomanclature
BushraShaikh44
 
Phase shift keying
Phase shift keyingPhase shift keying
Phase shift keying
BushraShaikh44
 

More from BushraShaikh44 (6)

Robort
RobortRobort
Robort
 
Prefix suffix
Prefix suffixPrefix suffix
Prefix suffix
 
Additive softmax
Additive softmaxAdditive softmax
Additive softmax
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Bionomial nomanclature
Bionomial nomanclatureBionomial nomanclature
Bionomial nomanclature
 
Phase shift keying
Phase shift keyingPhase shift keying
Phase shift keying
 

Recently uploaded

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

Mel frequency cepstral coefficient (mfcc)

  • 2. Linear Prediction Coefficients (LPCs) and Linear Prediction Cepstral Coefficients (LPCCs) were the main feature type for automatic speech recognition (ASR), especially with HMM classifiers. Prior to the introduction of MFCCs Mel-Frequency Cepstral Coefficients (MFCCs) were very popular features for a long time; but more recently, filter banks are becoming increasingly popular MFCCs were very useful with Gaussian Mixture Models - Hidden Markov Models (GMMs-HMMs), MFCCs and GMMs-HMMs co-evolved to be the standard way of doing Automatic Speech Recognition (ASR). It turns out that filter bank coefficients are highly correlated, which could be problematic in some machine learning algorithms.
  • 3. The job of MFCCs is to accurately represent this envelope of the short time power spectrum. Frame the signal into short frames For each frame calculate the periodogram estimate of the power spectrum Apply the mel filterbank to the power spectra and sum the energy in each filter Take the logarithm of all filterbank energies Take the DCT of the log filterbank energies. Keep DCT coefficients 2-13, discard the rest
  • 4. Mel scale The formula for converting from frequency to Mel scale is: To go from Mels back to frequency: The Mel scale relates perceived frequency, or pitch, of a pure tone to its actual measured frequency. Humans are much better at discerning small changes in pitch at low frequencies than they are at high frequencies. Incorporating this scale makes our features match more closely what humans hear.
  • 5. An audio signal is constantly changing, so to simplify things assume it on short time scales i.e. statistically stationary but the samples are constantly changing. Due to this the signal is divided into frames. If the frame is much shorter, we don't have enough samples to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame.
  • 6. The next step is to calculate the power spectrum of each frame. This is motivated by the human cochlea (an organ in the ear) which vibrates at different spots depending on the frequency of the incoming sounds. Depending on the location in the cochlea that vibrates (which wobbles small hairs), different nerves fire informing the brain that certain frequencies are present. Our periodogram estimate performs a similar job for us, identifying which frequencies are present in the frame.
  • 7. The periodogram spectral estimate still contains a lot of information not required for Automatic Speech Recognition (ASR). The cochlea can not discern the difference between two closely spaced frequencies. This effect becomes more pronounced as the frequencies increase. For this reason we take clumps of periodogram bins and sum them up to get an idea of how much energy exists in various frequency regions. This is performed by our Mel filter bank: the first filter is very narrow and gives an indication of how much energy exists near 0 Hertz. As the frequencies get higher our filters get wider as we become less concerned about variations. We are only interested in roughly how much energy occurs at each spot. The Mel scale tells us exactly how to space our filter banks and how wide to make them.
  • 8. Once we have the filter bank energies, we take the logarithm of them. This is also motivated by human hearing: we don't hear loudness on a linear scale. Generally to double the perceived volume of a sound we need to put 8 times as much energy into it. This means that large variations in energy may not sound all that different if the sound is loud to begin with. This compression operation makes our features match more closely what humans hear.
  • 9. The final step is to compute the DCT of the log filter bank energies. There are 2 main reasons this is performed. Because our filter banks are all overlapping, the filter bank energies are quite correlated with each other. The DCT decorrelates the energies which means diagonal covariance matrices can be used to model the features in e.g. a HMM classifier. But the higher DCT coefficients represent fast changes in the filter bank energies and it turns out that these fast changes degrade ASR performance, so we get a small improvement by dropping them.

Editor's Notes

  1. All steps needed to compute filter banks were motivated by the nature of the speech signal and the human perception of such signals. On the contrary, the extra steps needed to compute MFCCs were motivated by the limitation of some machine learning algorithms.