SlideShare a Scribd company logo
5-7 June, Dhaka, Bangladesh
Investigation of the Effect of MFCC Variation on the
Convolutional Neural Network-based
Speech Classification
Md. Rakibul Hasan, Md. Mahbub Hasan
Department of Electrical and Electronic Engineering
Khulna University of Engineering & Technology
Khulna-9203, Bangladesh
Outlines of the Talk
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION 2
Relevant Theories
Materials and Methods
Results & Discussion
Contributions and Conclusion
Question & Answer Session
MULTIMEDIA AND SIGNAL PROCESSING (II)
Relevant Theories
3
Sounds that are filtered by the shape of the vocal tract
including tongue, teeth, etc.
This shape determines what sound comes out.
What is Speech?
Vocal tract dynamics respond differently when speaking
Vowels or Words
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Materials and Methods
4
Chosen Isolated Vowels: /অ/ [/ɔ/], /আ/ [/a/], / ই/[/i/] , /উ/ [/u/],
/ঋ/[/ri/], /এ/ [/e/], /ঐ/ [/oi/]
Chosen Isolated Words: ব োতল, ন, কপি, ব োকোন, বেষ, সঠিক, উিরে
Creation of Dataset
Recorded
by smart-
phone
Converted
Stereo
track to
Mono
Clipped
into
distinct
speech
tokens
40 data in
each of the
speech
class
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Materials and Methods (Contd.)
5
Feature Extraction
Mel-Frequency Cepstral Coefficient (MFCC)  librosa
Python package
Convolutional Neural Network Model
Coding Environment: ‘Python 3.5’ on ‘Jupyter’ Notebook
Library: Keras, an open source Neural Network Library
Recognition Performance
Loss
Accuracy
Confusion Matrix
Cross-validation Score
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Materials and Methods (Contd.)
6
Recognition Model Architecture
Fig. 1: Architecture of the model.
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Results & Discussion
7
Fig. 2: Loss and Accuracy Comparison.
Performance Comparison between Vowel and Word
Recognition
Loss Accuracy
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Results & Discussion (Contd.)
8
Fig. 3: Confusion Matrix Comparison.
Vowel Word
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Results & Discussion (Contd.)
9
Recognition Fold -1 Fold-2 Fold-3 Fold-4
Overall
Accuracy
Vowel 95.71% 94.29% 92.86% 92.86% 93.93%
Word 90.00% 87.14% 90.00% 92.86% 90.00%
Table I: 4 – Fold Cross-validation Result.
So, Vowel Recognition outperforms Word Recognition in all cases.
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Results & Discussion (Contd.)
10
What’s the reason behind the Performance Deviation?
Vowel Word
Fig. 4: MFCC variation.
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Results & Discussion (Contd.)
11
What’s the reason behind the Performance Deviation?
Variation of Vocal Tract Dynamics is the answer!
‘অ’ vowel ‘ব োতল’ word
Fig. 5: Waveshape of the speech token.
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Contributions & Conclusion
12
𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 ∝
𝟏
𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
Creation of
Dataset
• Raw Speech
Data Collection
• Make them
Isolated
Feature
Engineering
• MFCC
Extraction
• Variation
among MFCC
features
Neural Network
Model
• Optimum
selection of the
hyperparameters
Performance
Evaluation
• Performance
Metrics
• Comparison
between Vowel
and Word
Recognition
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)
Question & Answer Session
13
THANK YOU

Any Question is warmly welcome...
MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION
MULTIMEDIA AND SIGNAL PROCESSING (II)

More Related Content

Similar to Investigation of the Effect of MFCC Variation on the Convolutional Neural Network-based Speech Classification

Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
iosrjce
 
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
acijjournal
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Performance on speech enhancement objective quality measures using hybrid wav...
Performance on speech enhancement objective quality measures using hybrid wav...Performance on speech enhancement objective quality measures using hybrid wav...
Performance on speech enhancement objective quality measures using hybrid wav...
karthik annam
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
kevig
 
F010334548
F010334548F010334548
F010334548
IOSR Journals
 
B110512
B110512B110512
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
kevig
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
niranjan kumar
 
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
indexPub
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
IAESIJEECS
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
IJERA Editor
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEM
cseij
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
TELKOMNIKA JOURNAL
 
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
IRJET Journal
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
ijait
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
IJMER
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
Iasir Journals
 

Similar to Investigation of the Effect of MFCC Variation on the Convolutional Neural Network-based Speech Classification (20)

Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
 
Performance on speech enhancement objective quality measures using hybrid wav...
Performance on speech enhancement objective quality measures using hybrid wav...Performance on speech enhancement objective quality measures using hybrid wav...
Performance on speech enhancement objective quality measures using hybrid wav...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
F010334548
F010334548F010334548
F010334548
 
B110512
B110512B110512
B110512
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEM
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 

More from Md Rakibul Hasan

An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
Md Rakibul Hasan
 
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
Md Rakibul Hasan
 
IoT Based Smart Energy Management in Residential Applications
IoT Based Smart Energy Management in Residential ApplicationsIoT Based Smart Energy Management in Residential Applications
IoT Based Smart Energy Management in Residential Applications
Md Rakibul Hasan
 
Workshop on MATLAB and SIMULINK
Workshop on MATLAB and SIMULINKWorkshop on MATLAB and SIMULINK
Workshop on MATLAB and SIMULINK
Md Rakibul Hasan
 
E-health and agri-digitization in Bangladesh
E-health and agri-digitization in BangladeshE-health and agri-digitization in Bangladesh
E-health and agri-digitization in Bangladesh
Md Rakibul Hasan
 
PIR triggered camera using Raspberry Pi
PIR triggered camera using Raspberry PiPIR triggered camera using Raspberry Pi
PIR triggered camera using Raspberry Pi
Md Rakibul Hasan
 

More from Md Rakibul Hasan (6)

An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
Outcomes of Deep Neural Network Hyperparameter Tuning on Bengali Speech Token...
 
IoT Based Smart Energy Management in Residential Applications
IoT Based Smart Energy Management in Residential ApplicationsIoT Based Smart Energy Management in Residential Applications
IoT Based Smart Energy Management in Residential Applications
 
Workshop on MATLAB and SIMULINK
Workshop on MATLAB and SIMULINKWorkshop on MATLAB and SIMULINK
Workshop on MATLAB and SIMULINK
 
E-health and agri-digitization in Bangladesh
E-health and agri-digitization in BangladeshE-health and agri-digitization in Bangladesh
E-health and agri-digitization in Bangladesh
 
PIR triggered camera using Raspberry Pi
PIR triggered camera using Raspberry PiPIR triggered camera using Raspberry Pi
PIR triggered camera using Raspberry Pi
 

Recently uploaded

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 

Recently uploaded (20)

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 

Investigation of the Effect of MFCC Variation on the Convolutional Neural Network-based Speech Classification

  • 1. 5-7 June, Dhaka, Bangladesh Investigation of the Effect of MFCC Variation on the Convolutional Neural Network-based Speech Classification Md. Rakibul Hasan, Md. Mahbub Hasan Department of Electrical and Electronic Engineering Khulna University of Engineering & Technology Khulna-9203, Bangladesh
  • 2. Outlines of the Talk MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION 2 Relevant Theories Materials and Methods Results & Discussion Contributions and Conclusion Question & Answer Session MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 3. Relevant Theories 3 Sounds that are filtered by the shape of the vocal tract including tongue, teeth, etc. This shape determines what sound comes out. What is Speech? Vocal tract dynamics respond differently when speaking Vowels or Words MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 4. Materials and Methods 4 Chosen Isolated Vowels: /অ/ [/ɔ/], /আ/ [/a/], / ই/[/i/] , /উ/ [/u/], /ঋ/[/ri/], /এ/ [/e/], /ঐ/ [/oi/] Chosen Isolated Words: ব োতল, ন, কপি, ব োকোন, বেষ, সঠিক, উিরে Creation of Dataset Recorded by smart- phone Converted Stereo track to Mono Clipped into distinct speech tokens 40 data in each of the speech class MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 5. Materials and Methods (Contd.) 5 Feature Extraction Mel-Frequency Cepstral Coefficient (MFCC)  librosa Python package Convolutional Neural Network Model Coding Environment: ‘Python 3.5’ on ‘Jupyter’ Notebook Library: Keras, an open source Neural Network Library Recognition Performance Loss Accuracy Confusion Matrix Cross-validation Score MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 6. Materials and Methods (Contd.) 6 Recognition Model Architecture Fig. 1: Architecture of the model. MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 7. Results & Discussion 7 Fig. 2: Loss and Accuracy Comparison. Performance Comparison between Vowel and Word Recognition Loss Accuracy MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 8. Results & Discussion (Contd.) 8 Fig. 3: Confusion Matrix Comparison. Vowel Word MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 9. Results & Discussion (Contd.) 9 Recognition Fold -1 Fold-2 Fold-3 Fold-4 Overall Accuracy Vowel 95.71% 94.29% 92.86% 92.86% 93.93% Word 90.00% 87.14% 90.00% 92.86% 90.00% Table I: 4 – Fold Cross-validation Result. So, Vowel Recognition outperforms Word Recognition in all cases. MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 10. Results & Discussion (Contd.) 10 What’s the reason behind the Performance Deviation? Vowel Word Fig. 4: MFCC variation. MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 11. Results & Discussion (Contd.) 11 What’s the reason behind the Performance Deviation? Variation of Vocal Tract Dynamics is the answer! ‘অ’ vowel ‘ব োতল’ word Fig. 5: Waveshape of the speech token. MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 12. Contributions & Conclusion 12 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 ∝ 𝟏 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 Creation of Dataset • Raw Speech Data Collection • Make them Isolated Feature Engineering • MFCC Extraction • Variation among MFCC features Neural Network Model • Optimum selection of the hyperparameters Performance Evaluation • Performance Metrics • Comparison between Vowel and Word Recognition MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)
  • 13. Question & Answer Session 13 THANK YOU  Any Question is warmly welcome... MD. RAKIBUL HASAN, INVESTIGATION OF THE EFFECT OF MFCC VARIATION ON THE CNN-BASED SPEECH CLASSIFICATION MULTIMEDIA AND SIGNAL PROCESSING (II)