SlideShare a Scribd company logo
1 of 9
Mel Frequency Cepstral
Coefficient (MFCC)
Linear Prediction Coefficients (LPCs)
and Linear Prediction Cepstral
Coefficients (LPCCs) were the main
feature type for automatic speech
recognition (ASR), especially
with HMM classifiers.
Prior to the
introduction of
MFCCs
Mel-Frequency Cepstral Coefficients
(MFCCs) were very popular features
for a long time; but more recently,
filter banks are becoming
increasingly popular
MFCCs were very useful with Gaussian
Mixture Models - Hidden Markov
Models (GMMs-HMMs), MFCCs and
GMMs-HMMs co-evolved to be the
standard way of doing Automatic
Speech Recognition (ASR).
It turns out that filter bank
coefficients are highly correlated,
which could be problematic in some
machine learning algorithms.
The job of MFCCs is to
accurately represent this
envelope of the short
time power spectrum.
Frame the signal into short frames
For each frame calculate the periodogram estimate of the power
spectrum
Apply the mel filterbank to the power spectra and sum the
energy in each filter
Take the logarithm of all filterbank energies
Take the DCT of the log filterbank energies.
Keep DCT coefficients 2-13, discard the rest
Mel scale
The formula for converting
from frequency to Mel scale
is:
To go from Mels back to
frequency:
The Mel scale relates perceived frequency, or pitch, of
a pure tone to its actual measured frequency. Humans
are much better at discerning small changes in pitch at
low frequencies than they are at high frequencies.
Incorporating this scale makes our features match
more closely what humans hear.
An audio signal is constantly changing, so to simplify things
assume it on short time scales i.e. statistically stationary
but the samples are constantly changing. Due to this the
signal is divided into frames. If the frame is much shorter,
we don't have enough samples to get a reliable spectral
estimate, if it is longer the signal changes too much
throughout the frame.
The next step is to calculate the power spectrum of each
frame. This is motivated by the human cochlea (an organ in
the ear) which vibrates at different spots depending on the
frequency of the incoming sounds. Depending on the
location in the cochlea that vibrates (which wobbles small
hairs), different nerves fire informing the brain that certain
frequencies are present. Our periodogram estimate performs
a similar job for us, identifying which frequencies are present
in the frame.
The periodogram spectral estimate still contains a lot of information not required for Automatic
Speech Recognition (ASR). The cochlea can not discern the difference between two closely spaced
frequencies. This effect becomes more pronounced as the frequencies increase. For this reason
we take clumps of periodogram bins and sum them up to get an idea of how much energy exists
in various frequency regions. This is performed by our Mel filter bank: the first filter is very
narrow and gives an indication of how much energy exists near 0 Hertz. As the frequencies get
higher our filters get wider as we become less concerned about variations. We are only interested
in roughly how much energy occurs at each spot. The Mel scale tells us exactly how to space our
filter banks and how wide to make them.
Once we have the filter bank energies, we take the logarithm of
them. This is also motivated by human hearing: we don't hear
loudness on a linear scale. Generally to double the perceived
volume of a sound we need to put 8 times as much energy into it.
This means that large variations in energy may not sound all that
different if the sound is loud to begin with. This compression
operation makes our features match more closely what humans
hear.
The final step is to compute the DCT of the log filter bank
energies. There are 2 main reasons this is performed. Because
our filter banks are all overlapping, the filter bank energies are
quite correlated with each other. The DCT decorrelates the
energies which means diagonal covariance matrices can be
used to model the features in e.g. a HMM classifier. But the
higher DCT coefficients represent fast changes in the filter
bank energies and it turns out that these fast changes degrade
ASR performance, so we get a small improvement by dropping
them.

More Related Content

What's hot

Frequency hopping spread spectrum
Frequency hopping spread spectrumFrequency hopping spread spectrum
Frequency hopping spread spectrumHarshit Gupta
 
Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms Mohammed Abuibaid
 
Frequency Hopping Spread Spectrum (FHSS) System
Frequency Hopping Spread Spectrum (FHSS) System Frequency Hopping Spread Spectrum (FHSS) System
Frequency Hopping Spread Spectrum (FHSS) System Ahmed Alfadhel
 
Diversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationDiversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationSahar Foroughi
 
Power delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadPower delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadManish Srivastava
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Sensor Networks Introduction and Architecture
Sensor Networks Introduction and ArchitectureSensor Networks Introduction and Architecture
Sensor Networks Introduction and ArchitecturePeriyanayagiS
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processingmargretrosy
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimationjunjer
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive CodingSrishti Kakade
 

What's hot (20)

Frequency hopping spread spectrum
Frequency hopping spread spectrumFrequency hopping spread spectrum
Frequency hopping spread spectrum
 
Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms Adaptive Beamforming Algorithms
Adaptive Beamforming Algorithms
 
Frequency Hopping Spread Spectrum (FHSS) System
Frequency Hopping Spread Spectrum (FHSS) System Frequency Hopping Spread Spectrum (FHSS) System
Frequency Hopping Spread Spectrum (FHSS) System
 
Diversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationDiversity Techniques in Wireless Communication
Diversity Techniques in Wireless Communication
 
Power delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadPower delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spread
 
Channel Estimation
Channel EstimationChannel Estimation
Channel Estimation
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Rf basics
Rf basicsRf basics
Rf basics
 
Basics of RF
Basics of RFBasics of RF
Basics of RF
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Mimo
MimoMimo
Mimo
 
Sensor Networks Introduction and Architecture
Sensor Networks Introduction and ArchitectureSensor Networks Introduction and Architecture
Sensor Networks Introduction and Architecture
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processing
 
Diplexer duplexer
Diplexer duplexerDiplexer duplexer
Diplexer duplexer
 
NOMA in 5G Networks
NOMA in 5G NetworksNOMA in 5G Networks
NOMA in 5G Networks
 
Vblast
VblastVblast
Vblast
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimation
 
Pn sequence
Pn sequencePn sequence
Pn sequence
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Interference and system capacity
Interference and system capacityInterference and system capacity
Interference and system capacity
 

Similar to Mel frequency cepstral coefficient (mfcc)

Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
Radiographic exposure and image quality
Radiographic exposure and image qualityRadiographic exposure and image quality
Radiographic exposure and image qualityRad Tech
 
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...ijwmn
 
Investigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM systemInvestigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM systemIOSR Journals
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingCSCJournals
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Evaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat labEvaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat labSikkim Manipal Institute Of Technology
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacRaja Ram
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Roman Atachiants
 

Similar to Mel frequency cepstral coefficient (mfcc) (20)

Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
C4_S2_G8 (1).pdf
C4_S2_G8  (1).pdfC4_S2_G8  (1).pdf
C4_S2_G8 (1).pdf
 
C4_S2_G8 .pdf
C4_S2_G8 .pdfC4_S2_G8 .pdf
C4_S2_G8 .pdf
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Hl3413921395
Hl3413921395Hl3413921395
Hl3413921395
 
Radiographic exposure and image quality
Radiographic exposure and image qualityRadiographic exposure and image quality
Radiographic exposure and image quality
 
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...P ERFORMANCE A NALYSIS  O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...
 
Investigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM systemInvestigation and Analysis of SNR Estimation in OFDM system
Investigation and Analysis of SNR Estimation in OFDM system
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Dl35622627
Dl35622627Dl35622627
Dl35622627
 
Evaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat labEvaluation of frequency domain features for myopathic emg signals in mat lab
Evaluation of frequency domain features for myopathic emg signals in mat lab
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
NON PARAMETRIC METHOD
NON PARAMETRIC METHODNON PARAMETRIC METHOD
NON PARAMETRIC METHOD
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiac
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
 

More from BushraShaikh44

More from BushraShaikh44 (6)

Robort
RobortRobort
Robort
 
Prefix suffix
Prefix suffixPrefix suffix
Prefix suffix
 
Additive softmax
Additive softmaxAdditive softmax
Additive softmax
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Bionomial nomanclature
Bionomial nomanclatureBionomial nomanclature
Bionomial nomanclature
 
Phase shift keying
Phase shift keyingPhase shift keying
Phase shift keying
 

Recently uploaded

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

Mel frequency cepstral coefficient (mfcc)

  • 2. Linear Prediction Coefficients (LPCs) and Linear Prediction Cepstral Coefficients (LPCCs) were the main feature type for automatic speech recognition (ASR), especially with HMM classifiers. Prior to the introduction of MFCCs Mel-Frequency Cepstral Coefficients (MFCCs) were very popular features for a long time; but more recently, filter banks are becoming increasingly popular MFCCs were very useful with Gaussian Mixture Models - Hidden Markov Models (GMMs-HMMs), MFCCs and GMMs-HMMs co-evolved to be the standard way of doing Automatic Speech Recognition (ASR). It turns out that filter bank coefficients are highly correlated, which could be problematic in some machine learning algorithms.
  • 3. The job of MFCCs is to accurately represent this envelope of the short time power spectrum. Frame the signal into short frames For each frame calculate the periodogram estimate of the power spectrum Apply the mel filterbank to the power spectra and sum the energy in each filter Take the logarithm of all filterbank energies Take the DCT of the log filterbank energies. Keep DCT coefficients 2-13, discard the rest
  • 4. Mel scale The formula for converting from frequency to Mel scale is: To go from Mels back to frequency: The Mel scale relates perceived frequency, or pitch, of a pure tone to its actual measured frequency. Humans are much better at discerning small changes in pitch at low frequencies than they are at high frequencies. Incorporating this scale makes our features match more closely what humans hear.
  • 5. An audio signal is constantly changing, so to simplify things assume it on short time scales i.e. statistically stationary but the samples are constantly changing. Due to this the signal is divided into frames. If the frame is much shorter, we don't have enough samples to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame.
  • 6. The next step is to calculate the power spectrum of each frame. This is motivated by the human cochlea (an organ in the ear) which vibrates at different spots depending on the frequency of the incoming sounds. Depending on the location in the cochlea that vibrates (which wobbles small hairs), different nerves fire informing the brain that certain frequencies are present. Our periodogram estimate performs a similar job for us, identifying which frequencies are present in the frame.
  • 7. The periodogram spectral estimate still contains a lot of information not required for Automatic Speech Recognition (ASR). The cochlea can not discern the difference between two closely spaced frequencies. This effect becomes more pronounced as the frequencies increase. For this reason we take clumps of periodogram bins and sum them up to get an idea of how much energy exists in various frequency regions. This is performed by our Mel filter bank: the first filter is very narrow and gives an indication of how much energy exists near 0 Hertz. As the frequencies get higher our filters get wider as we become less concerned about variations. We are only interested in roughly how much energy occurs at each spot. The Mel scale tells us exactly how to space our filter banks and how wide to make them.
  • 8. Once we have the filter bank energies, we take the logarithm of them. This is also motivated by human hearing: we don't hear loudness on a linear scale. Generally to double the perceived volume of a sound we need to put 8 times as much energy into it. This means that large variations in energy may not sound all that different if the sound is loud to begin with. This compression operation makes our features match more closely what humans hear.
  • 9. The final step is to compute the DCT of the log filter bank energies. There are 2 main reasons this is performed. Because our filter banks are all overlapping, the filter bank energies are quite correlated with each other. The DCT decorrelates the energies which means diagonal covariance matrices can be used to model the features in e.g. a HMM classifier. But the higher DCT coefficients represent fast changes in the filter bank energies and it turns out that these fast changes degrade ASR performance, so we get a small improvement by dropping them.

Editor's Notes

  1. All steps needed to compute filter banks were motivated by the nature of the speech signal and the human perception of such signals. On the contrary, the extra steps needed to compute MFCCs were motivated by the limitation of some machine learning algorithms.