SlideShare a Scribd company logo
Isolated Bangla Word Recognition and Speaker Detection by
Semantic Modular Time Delay Neural Network (MTDNN)
Authors:
Md. Yasin Ali Khan, S. M. Mostaq Hossain and Mohammed Moshiul Hoque
Department of Computer Science and Engineering, Chittagong University of Engineering
and Technology, Chittagong-4349, Bangladesh
Presented by:
Md. Yasin Ali Khan
Introduction
• Speech is the most common forms of human
communication.
3 May 2016 2
• Also used for human to machine communication.
Human to human Human to machine
Word recognition: Recognizing an uttered word that
matched with the actual word.
Speaker recognition: Recognizing a particular person based
on his/her voice signal.
Isolated words: Adjacent words having sufficient silence.
3 May 2016 3
Introduction (cont.)
3 May 2016 4
Sp-1
Sp-2
Sp-3
আকাশ, পানি, িদী
পানি MTDNN Module
Modular Time Delay Neural Network (MTDNN): Development
of a time-delay neural network which is scaled by modularity of
large nets and based on smaller subcomponent nets.
Figure 1: Abstract representation of proposed framework.
Introduction (cont.)
3 May 2016 5
• No previous work on speaker detection in Bangla
language
• Experimenting with large vocabulary of Bangla
words and different syllables
Motivation
• Bangla speech based data documentation & information retrieval
3 May 2016 6
Control
module
• Bangla speech based security control
Control
module
Motivation (cont.)
• Designed a theoretical framework for recognizing Bangla
isolated words and detecting corresponding speakers.
• Implemented the framework using Modular Time Delay
Neural Network (MTDNN).
• Evaluated the system by using different Bangla words and
speakers.
3 May 2016 7
Contributions
• Speech Recognition using Artificial Neural Network (Roy et al. , 2002)
- Restricted on 5 words
- ANN Technology
- FFT Feature
• Speech Recognition using 3 Layer Back Propagation Neural Network (Islam
et al. , 2005)
- Restricted on 30 words
- Develops Multi-Layer Back-propagation NN
- FFT Feature
Limitation: High training time
3 May 2016 8
Bangla Language
Related Work
3 May 2016 9
Methodology
• Problem Formulation
𝑣 𝑡 =
𝐸ₒ𝑒ᵅᵗ sin 𝜔𝑡 0 ≤ 𝑡 < 𝑇𝑒
𝐸₁ 𝑒¯ᵝ⁽ᵗ¯ᵀᵉ⁾ − 𝑒¯ᵝ⁽ᵀᵉ¯ᵀ ͨ⁾ 𝑇𝑒 ≤ 𝑡 < 𝑇𝑐
0 𝑇𝑐 ≤ 𝑡 ≤ 𝑇𝑜
Glottal Pulse Model for Voiced Signals as The Liljencrants-
Fant (LF) model [5]
3 May 2016 10
Methodology (cont.)
• Mathematical Description
[ŵ₁, ŵ₂,….ŵɴ] = 𝒎𝒂𝒙
𝒘 𝟏, 𝒘 𝟐 ,…….𝒘 𝑵
𝒇( 𝒘 𝟏, 𝒘 𝟐 , … … . 𝒘 𝑵 𝑿, 𝜦)
[P₁, P₂,….Pɴ] = 𝒎𝒂𝒙
𝒑 𝟏, 𝒑 𝟐 ,…….𝒑 𝑵
𝒇( 𝒑 𝟏, 𝒑 𝟐 , … … . 𝒑 𝑵 𝑿, 𝜦)
- For Speech Recognition
- For Speaker Detection
where f is the conditional probability of a sequence of words W given
a sequence of speech features X and 𝚲 is pre-trained speech model.
where f is the conditional probability of a sequence of speakers P
given a sequence of speaker features X and 𝚲 is pre-trained speaker
model.
3 May 2016 12Figure 2: System Architecture.
Methodology (cont.)
3 May 2016 13
• MTDNN
Framework
Figure 3: MTDNN framework.
Methodology (cont.)
3 May 2016 13
• Environmental Setup
Operating System Lab in CUET
• System Requirements
Operating System : Windows 7 (Core i3, 3.66 GHz, 8GB RAM)
Software : Creative WaveStudio, Matlab
Tools : Close talk microphone
Experiments
3 May 2016 14
• Data Collection
Experiments (cont.)
No. of Participants : 5
No. of Words : 46
Age Difference : 21-23
Uttering times : 3 (For each word)
(5 [participants] ˟ 46 [no. of words] ˟ 3 [no. of times uttered each word])
= 690 samples
• Sample Size
3 May 2016 15
Average file size 81.67 KB
Total no files 690
Average duration
of files (.wav)
0.82 sec
Age average 22
• File Information
for Samples
Experiments (cont.)
3 May 2016 16
Experiments (cont.)
Mono Syllable:
দেশ, বই, দবশ, দশোক, দ োট, কোজ, তোর, রোত, গোছ, দকউ, আজ
Di Syllable:
তততি, পড়েি, হড়ব, টোকো, থোড়কি, আমরো, পোতো, পুতিশ, কথো, উৎসব, যুদ্ধ, তবজয়
Tri Syllable:
ধরড়ের, জোিোড়িোর, আমোড়ের, বতত মোি, ফিোফি, সুতবধো, কততত পড়ের, দসৌন্দযত, পেতযোগ, পততথবীর
Poly Syllable:
স্বোধীিতোর, সহড়যোতগতো
Poly Syllable:
Word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2
• Sample Words List
3 May 2016 17
Epoch 3
Avg. training time 1 minute and 5 seconds
Gradient error 4.69e-13 [Range (1.00 to 1.00e-10)]
Mu error 1.00e-06 [Range (0.00100 to 1.00e+10)]
Validation
checks
2
Performance 0.165 [Range (0.200 to 0.00)]
Accuracy 82.66%
• Result Analysis of
Word Recognition
&
Speaker Detection
Experimental Results
3 May 2016 18
Method No of words Feature Recognition
rate
Roy et al. [1] 5 FFT 82%
Islam et al. [2] 13 FFT 78.85%
Proposed 35 MFCC 82.66%
• Comparison Table with
Previous Work
Experimental Results (cont.)
• Technology: Modular Time Delay Neural Network
(MTDNN)
• 82.66% overall accuracy ( in avg.)
3 May 2016 21
Conclusion
3 May 2016 20
• Future Recommendations
• Experimenting with More Bangla Words
• Variable Age of Participants
Conclusion (cont.)
[1] K. Roy, D. Das, and M. G. Ali, “Development of the speech recognition system using artificial neural
network,” in Proc.5th International Conference on Computer and Information Technology (ICCIT02),
Dhaka, Bangladesh, 2002.
[2] M. R. Islam, A. Sayeed, M. Sohail, M. W. H. Sadid, M. A. Mottalib, “Bangla Speech Recognition
using three layer Back-Propagation Neural Network”, Proc. of NCCPB, Dhaka, 2005.
[3] M. F. Khan and R. C. Debnath, “Comparative Study of Feature ExtractionMethods for Bangla
Phoneme Recognition”, 5th ICCIT-2002, Dhaka, Bangladesh, pp. 257-261.
[4] J. C. Bezdek, R. Ehrlich, W. Full, “FCM: The Fuzzy C-Means Clustering Algorithm”, Computers &
Geosciences Vol. 10, No. 2-3, pp. 191-203, 1984.
[5] L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice-Hall, Englewood
Cliffs, NJ, 1993.
3 May 2016 23
References
Thank You !!!

More Related Content

Viewers also liked

5 Tips for Starting Your Own Business
5 Tips for Starting Your Own Business5 Tips for Starting Your Own Business
5 Tips for Starting Your Own Business
Rosemary Nonny Knight | Entrepreneur Mentor Coach LION
 
Tracy Fernandes_Resume
Tracy Fernandes_ResumeTracy Fernandes_Resume
Tracy Fernandes_Resume
Tracy Fernandes
 
Supernatural.extravaganza
Supernatural.extravaganzaSupernatural.extravaganza
Supernatural.extravaganza
Truth Within
 
Network marketing and passive income streams
Network marketing and passive income streamsNetwork marketing and passive income streams
Network marketing and passive income streams
Rosemary Nonny Knight | Entrepreneur Mentor Coach LION
 
Using EDMODO with Young Learners
Using EDMODO with Young LearnersUsing EDMODO with Young Learners
Using EDMODO with Young Learners
Sophia Mavridi
 
Agile Lean Conference 2016 - Cagliesi - Agile like the queen
Agile Lean Conference 2016 - Cagliesi - Agile like the queenAgile Lean Conference 2016 - Cagliesi - Agile like the queen
Agile Lean Conference 2016 - Cagliesi - Agile like the queen
Agile Lean Conference
 
Future tense b1
Future tense b1Future tense b1
Future tense b1
angeliconuccio
 
Save Our Story
Save Our StorySave Our Story
Save Our Story
Eric Langhorst
 
Bitcoin
BitcoinBitcoin
Paperless Office Project
Paperless Office ProjectPaperless Office Project
Paperless Office Project
Siddharth Shah
 
Presentación diazinon ata 2016 1
Presentación diazinon ata 2016 1Presentación diazinon ata 2016 1
Presentación diazinon ata 2016 1
Asociación Toxicológica Argentina
 
Sistema monetário brasileiro
Sistema monetário brasileiroSistema monetário brasileiro
Sistema monetário brasileiro
Daniela Menezes
 
ñ Q-v-w-y-beste batzuk
ñ Q-v-w-y-beste batzukñ Q-v-w-y-beste batzuk
ñ Q-v-w-y-beste batzuk
ihintza6
 
X x
X xX x
Document Recognition Technologies
Document Recognition TechnologiesDocument Recognition Technologies
Document Recognition Technologies
Chris Riley ☁
 

Viewers also liked (17)

5 Tips for Starting Your Own Business
5 Tips for Starting Your Own Business5 Tips for Starting Your Own Business
5 Tips for Starting Your Own Business
 
8 archives nationales c
8 archives nationales c8 archives nationales c
8 archives nationales c
 
Tracy Fernandes_Resume
Tracy Fernandes_ResumeTracy Fernandes_Resume
Tracy Fernandes_Resume
 
Supernatural.extravaganza
Supernatural.extravaganzaSupernatural.extravaganza
Supernatural.extravaganza
 
Do
DoDo
Do
 
Network marketing and passive income streams
Network marketing and passive income streamsNetwork marketing and passive income streams
Network marketing and passive income streams
 
Using EDMODO with Young Learners
Using EDMODO with Young LearnersUsing EDMODO with Young Learners
Using EDMODO with Young Learners
 
Agile Lean Conference 2016 - Cagliesi - Agile like the queen
Agile Lean Conference 2016 - Cagliesi - Agile like the queenAgile Lean Conference 2016 - Cagliesi - Agile like the queen
Agile Lean Conference 2016 - Cagliesi - Agile like the queen
 
Future tense b1
Future tense b1Future tense b1
Future tense b1
 
Save Our Story
Save Our StorySave Our Story
Save Our Story
 
Bitcoin
BitcoinBitcoin
Bitcoin
 
Paperless Office Project
Paperless Office ProjectPaperless Office Project
Paperless Office Project
 
Presentación diazinon ata 2016 1
Presentación diazinon ata 2016 1Presentación diazinon ata 2016 1
Presentación diazinon ata 2016 1
 
Sistema monetário brasileiro
Sistema monetário brasileiroSistema monetário brasileiro
Sistema monetário brasileiro
 
ñ Q-v-w-y-beste batzuk
ñ Q-v-w-y-beste batzukñ Q-v-w-y-beste batzuk
ñ Q-v-w-y-beste batzuk
 
X x
X xX x
X x
 
Document Recognition Technologies
Document Recognition TechnologiesDocument Recognition Technologies
Document Recognition Technologies
 

Similar to ICCIT_presentation

10
1010
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
IJERA Editor
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
ijtsrd
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
ijcsit
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
AIRCC Publishing Corporation
 
19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
Shivlal Mewada
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
ijcsit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...
ijitjournal
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
IJMER
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
Iasir Journals
 
De4201715719
De4201715719De4201715719
De4201715719
IJERA Editor
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Publishing House
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
eSAT Journals
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
IJERA Editor
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Waqas Tariq
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
iosrjce
 

Similar to ICCIT_presentation (20)

10
1010
10
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
De4201715719
De4201715719De4201715719
De4201715719
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 

ICCIT_presentation

  • 1. Isolated Bangla Word Recognition and Speaker Detection by Semantic Modular Time Delay Neural Network (MTDNN) Authors: Md. Yasin Ali Khan, S. M. Mostaq Hossain and Mohammed Moshiul Hoque Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong-4349, Bangladesh Presented by: Md. Yasin Ali Khan
  • 2. Introduction • Speech is the most common forms of human communication. 3 May 2016 2 • Also used for human to machine communication. Human to human Human to machine
  • 3. Word recognition: Recognizing an uttered word that matched with the actual word. Speaker recognition: Recognizing a particular person based on his/her voice signal. Isolated words: Adjacent words having sufficient silence. 3 May 2016 3 Introduction (cont.)
  • 4. 3 May 2016 4 Sp-1 Sp-2 Sp-3 আকাশ, পানি, িদী পানি MTDNN Module Modular Time Delay Neural Network (MTDNN): Development of a time-delay neural network which is scaled by modularity of large nets and based on smaller subcomponent nets. Figure 1: Abstract representation of proposed framework. Introduction (cont.)
  • 5. 3 May 2016 5 • No previous work on speaker detection in Bangla language • Experimenting with large vocabulary of Bangla words and different syllables Motivation
  • 6. • Bangla speech based data documentation & information retrieval 3 May 2016 6 Control module • Bangla speech based security control Control module Motivation (cont.)
  • 7. • Designed a theoretical framework for recognizing Bangla isolated words and detecting corresponding speakers. • Implemented the framework using Modular Time Delay Neural Network (MTDNN). • Evaluated the system by using different Bangla words and speakers. 3 May 2016 7 Contributions
  • 8. • Speech Recognition using Artificial Neural Network (Roy et al. , 2002) - Restricted on 5 words - ANN Technology - FFT Feature • Speech Recognition using 3 Layer Back Propagation Neural Network (Islam et al. , 2005) - Restricted on 30 words - Develops Multi-Layer Back-propagation NN - FFT Feature Limitation: High training time 3 May 2016 8 Bangla Language Related Work
  • 9. 3 May 2016 9 Methodology • Problem Formulation 𝑣 𝑡 = 𝐸ₒ𝑒ᵅᵗ sin 𝜔𝑡 0 ≤ 𝑡 < 𝑇𝑒 𝐸₁ 𝑒¯ᵝ⁽ᵗ¯ᵀᵉ⁾ − 𝑒¯ᵝ⁽ᵀᵉ¯ᵀ ͨ⁾ 𝑇𝑒 ≤ 𝑡 < 𝑇𝑐 0 𝑇𝑐 ≤ 𝑡 ≤ 𝑇𝑜 Glottal Pulse Model for Voiced Signals as The Liljencrants- Fant (LF) model [5]
  • 10. 3 May 2016 10 Methodology (cont.) • Mathematical Description [ŵ₁, ŵ₂,….ŵɴ] = 𝒎𝒂𝒙 𝒘 𝟏, 𝒘 𝟐 ,…….𝒘 𝑵 𝒇( 𝒘 𝟏, 𝒘 𝟐 , … … . 𝒘 𝑵 𝑿, 𝜦) [P₁, P₂,….Pɴ] = 𝒎𝒂𝒙 𝒑 𝟏, 𝒑 𝟐 ,…….𝒑 𝑵 𝒇( 𝒑 𝟏, 𝒑 𝟐 , … … . 𝒑 𝑵 𝑿, 𝜦) - For Speech Recognition - For Speaker Detection where f is the conditional probability of a sequence of words W given a sequence of speech features X and 𝚲 is pre-trained speech model. where f is the conditional probability of a sequence of speakers P given a sequence of speaker features X and 𝚲 is pre-trained speaker model.
  • 11. 3 May 2016 12Figure 2: System Architecture. Methodology (cont.)
  • 12. 3 May 2016 13 • MTDNN Framework Figure 3: MTDNN framework. Methodology (cont.)
  • 13. 3 May 2016 13 • Environmental Setup Operating System Lab in CUET • System Requirements Operating System : Windows 7 (Core i3, 3.66 GHz, 8GB RAM) Software : Creative WaveStudio, Matlab Tools : Close talk microphone Experiments
  • 14. 3 May 2016 14 • Data Collection Experiments (cont.) No. of Participants : 5 No. of Words : 46 Age Difference : 21-23 Uttering times : 3 (For each word) (5 [participants] ˟ 46 [no. of words] ˟ 3 [no. of times uttered each word]) = 690 samples • Sample Size
  • 15. 3 May 2016 15 Average file size 81.67 KB Total no files 690 Average duration of files (.wav) 0.82 sec Age average 22 • File Information for Samples Experiments (cont.)
  • 16. 3 May 2016 16 Experiments (cont.) Mono Syllable: দেশ, বই, দবশ, দশোক, দ োট, কোজ, তোর, রোত, গোছ, দকউ, আজ Di Syllable: তততি, পড়েি, হড়ব, টোকো, থোড়কি, আমরো, পোতো, পুতিশ, কথো, উৎসব, যুদ্ধ, তবজয় Tri Syllable: ধরড়ের, জোিোড়িোর, আমোড়ের, বতত মোি, ফিোফি, সুতবধো, কততত পড়ের, দসৌন্দযত, পেতযোগ, পততথবীর Poly Syllable: স্বোধীিতোর, সহড়যোতগতো Poly Syllable: Word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2 • Sample Words List
  • 17. 3 May 2016 17 Epoch 3 Avg. training time 1 minute and 5 seconds Gradient error 4.69e-13 [Range (1.00 to 1.00e-10)] Mu error 1.00e-06 [Range (0.00100 to 1.00e+10)] Validation checks 2 Performance 0.165 [Range (0.200 to 0.00)] Accuracy 82.66% • Result Analysis of Word Recognition & Speaker Detection Experimental Results
  • 18. 3 May 2016 18 Method No of words Feature Recognition rate Roy et al. [1] 5 FFT 82% Islam et al. [2] 13 FFT 78.85% Proposed 35 MFCC 82.66% • Comparison Table with Previous Work Experimental Results (cont.)
  • 19. • Technology: Modular Time Delay Neural Network (MTDNN) • 82.66% overall accuracy ( in avg.) 3 May 2016 21 Conclusion
  • 20. 3 May 2016 20 • Future Recommendations • Experimenting with More Bangla Words • Variable Age of Participants Conclusion (cont.)
  • 21. [1] K. Roy, D. Das, and M. G. Ali, “Development of the speech recognition system using artificial neural network,” in Proc.5th International Conference on Computer and Information Technology (ICCIT02), Dhaka, Bangladesh, 2002. [2] M. R. Islam, A. Sayeed, M. Sohail, M. W. H. Sadid, M. A. Mottalib, “Bangla Speech Recognition using three layer Back-Propagation Neural Network”, Proc. of NCCPB, Dhaka, 2005. [3] M. F. Khan and R. C. Debnath, “Comparative Study of Feature ExtractionMethods for Bangla Phoneme Recognition”, 5th ICCIT-2002, Dhaka, Bangladesh, pp. 257-261. [4] J. C. Bezdek, R. Ehrlich, W. Full, “FCM: The Fuzzy C-Means Clustering Algorithm”, Computers & Geosciences Vol. 10, No. 2-3, pp. 191-203, 1984. [5] L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice-Hall, Englewood Cliffs, NJ, 1993. 3 May 2016 23 References