SlideShare a Scribd company logo
1. Overview 3. Methodology
4. Results
7. Further Work
Lexical Modeling of Egyptian Arabic for Automatic Speech Recognition
Taha Merghani1
, Tuka Al Hanai2
and James Glass2
1
Department of Electrical & Computer Engineering, Jackson State University
2
MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Developing an Automatic Speech Recognition system (ASR) for
Arabic is challenging.
● Ambiguous orthography
○ No Diacritics --information is missing.
○ Ambiguous word to phoneme mapping.
● Rich Morphology
○ Large number of combinations of stems+affixes.
● Diverse Dialects
○ Arabic widely spoken, 250+ million speakers.
○ No standardized orthography.
○ Limited data.
● Speech-enabled applications growing in everyday life
(Apple Siri, Car Navigation, Medical Transcription).
● 6000+ languages in the world.
Core of these systems is a recognizer:
2. MSRP Project
● Lexical Modelling
○ Diacritized lexicon with the aid of MADAMIRA, and Kaldi.
● Setup
○ Corpus: Al-Jazeera Egyptian Dialect.
○ 81K Tokens, 18K Vocab.
○ Training set: 10 hrs, 1hr development set,
and 1hr evaluation set.
‫ﻛﺗﺎﺑﮫ‬ ‫ﻓﻲ‬ ‫اﻟﻛﺎﺗب‬ ‫وﻛﺗب‬
wktb AlkAtb fy ktAbh
ndwrt thwrtr n hswrt
ِ‫ﮫ‬ِِِ‫ﺑ‬‫ِﺗﺎ‬‫ﻛ‬ ‫ِﻲ‬‫ﻓ‬ ُ‫ِب‬‫ﺗ‬‫َﺎ‬‫ﻛ‬‫اﻟ‬ َ‫َب‬‫ﺗ‬َ‫ﻛ‬‫و‬
waktaba AlkAtibu fiy kitAbihi
andwrote thewriter in hiswrit
MADAMIRA
● Language Modelling
○ Trigram with SRILM Kneser-Ney
discounting.
○ Built over training data transcript.
References
● Pasha et al, “MADAMIRA: A Fast, Comprehensive
Tool for Morphological Analysis and Disambiguation of
Arabic”, 2014
● F. Biadsy, N. Habash, and J. Hirschberg, “Improving
the Arabic pronunciation dictionary for phone and word
recognition with linguistically-based pronunciation
rules,”, 2009
Research Question: Does diacritization
of words improve ASR performance of
Egyptian Arabic?
Grapheme
Lexicon
Diacritized
Lexicon
‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬
Language
model
Lexicon
Acoustic
Model
Evaluate
(WER)
‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬
Decoder
● Train new acoustic models
○ Long Short-Term Memory Recursive Neural Networks
(LSTM-RNN) using Stacked Bottleneck features (SBN).
● Morpheme Based Language Modeling
○ Reduces Out-of-Vocabulary (OOV) rate of Dev/Eval sets.
● Explore larger lexicon sizes using tweets
6. Conclusion
● A graphemic lexicon (CODA) outperformed the
diacritized lexicon.
● Reducing OOV improves WER.
● Using DNN and SDNN acoustic models improves
WER.
● Acoustic Modeling
○ Features: MFCCs + CMVN + LDA + MLLT + SAT.
○ Triphone context-dependant GMM-HMM.
○ Deep Neural Networks (DNN).
○ Sequence Deep Neural Networks (SDNN).
5. Evaluation
Comparison between reference (REF) and hypothesis
(HYP) generated by ASR.
Word Error Rate (WER) = (Subs + Dels + Ins) / N
● Subs: Number of substitutions between REF and HYP.
● Dels: Number of deletions in HYP compared to REF.
● Ins: Number of insertions in HYP compared to REF.
● N: Total number of words in REF.
wktb : w k t b
AlkAtb : A l K A t b
fy : f y
ktAbh : k t A b h
wktb : w a k a t a b a
AlkAtb : A l K A t i b u
fy : f i y
ktAbh : k i t A b i h i
○ Toolkits: Kaldi Speech Recognition,
MADAMIRA text-processing.
○ Original transcription parsed and normalized
to generate CODA transcription.
Acoustic Model
Diacritized Lexicon Grapheme Lexicon (CODA) Grapheme Lexicon (Original)
Dev (%) Eval (%) Dev (%) Eval (%) Dev (%) Eval (%)
GMM-HMM 56.8 59.1 57.4 58.6 60.0 61.5
DNN 1024 x 4 54.8 57.4 55.0 58.0 58.8 61.2
SDNN 1024 x 4 53.1 56.1 53.0 55.6 57.0 59.8
Lexicon Size 17.8K 17.5K 18.7K
OOV (%) 15.3 15.3 16.6
#Lexical Units 43 36 35

More Related Content

Similar to Merghani-SACNAS Poster

Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Esa act
Esa actEsa act
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSTUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
kevig
 
Tuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural NetworksTuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural Networks
kevig
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Arabic mwe presentation 07
Arabic mwe presentation 07Arabic mwe presentation 07
Arabic mwe presentation 07
Mohammed Attia
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
Korakot Chaovavanich
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
IJECEIAES
 
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Lviv Startup Club
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Christophe Tricot
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
Jinho Choi
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
ijnlc
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
kevig
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
Stephen Marquard
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
IJECEIAES
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Provectus
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
Kwanghee Choi
 

Similar to Merghani-SACNAS Poster (20)

Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Esa act
Esa actEsa act
Esa act
 
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSTUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
 
Tuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural NetworksTuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural Networks
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Arabic mwe presentation 07
Arabic mwe presentation 07Arabic mwe presentation 07
Arabic mwe presentation 07
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 

Merghani-SACNAS Poster

  • 1. 1. Overview 3. Methodology 4. Results 7. Further Work Lexical Modeling of Egyptian Arabic for Automatic Speech Recognition Taha Merghani1 , Tuka Al Hanai2 and James Glass2 1 Department of Electrical & Computer Engineering, Jackson State University 2 MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology Developing an Automatic Speech Recognition system (ASR) for Arabic is challenging. ● Ambiguous orthography ○ No Diacritics --information is missing. ○ Ambiguous word to phoneme mapping. ● Rich Morphology ○ Large number of combinations of stems+affixes. ● Diverse Dialects ○ Arabic widely spoken, 250+ million speakers. ○ No standardized orthography. ○ Limited data. ● Speech-enabled applications growing in everyday life (Apple Siri, Car Navigation, Medical Transcription). ● 6000+ languages in the world. Core of these systems is a recognizer: 2. MSRP Project ● Lexical Modelling ○ Diacritized lexicon with the aid of MADAMIRA, and Kaldi. ● Setup ○ Corpus: Al-Jazeera Egyptian Dialect. ○ 81K Tokens, 18K Vocab. ○ Training set: 10 hrs, 1hr development set, and 1hr evaluation set. ‫ﻛﺗﺎﺑﮫ‬ ‫ﻓﻲ‬ ‫اﻟﻛﺎﺗب‬ ‫وﻛﺗب‬ wktb AlkAtb fy ktAbh ndwrt thwrtr n hswrt ِ‫ﮫ‬ِِِ‫ﺑ‬‫ِﺗﺎ‬‫ﻛ‬ ‫ِﻲ‬‫ﻓ‬ ُ‫ِب‬‫ﺗ‬‫َﺎ‬‫ﻛ‬‫اﻟ‬ َ‫َب‬‫ﺗ‬َ‫ﻛ‬‫و‬ waktaba AlkAtibu fiy kitAbihi andwrote thewriter in hiswrit MADAMIRA ● Language Modelling ○ Trigram with SRILM Kneser-Ney discounting. ○ Built over training data transcript. References ● Pasha et al, “MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic”, 2014 ● F. Biadsy, N. Habash, and J. Hirschberg, “Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules,”, 2009 Research Question: Does diacritization of words improve ASR performance of Egyptian Arabic? Grapheme Lexicon Diacritized Lexicon ‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬ Language model Lexicon Acoustic Model Evaluate (WER) ‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬ Decoder ● Train new acoustic models ○ Long Short-Term Memory Recursive Neural Networks (LSTM-RNN) using Stacked Bottleneck features (SBN). ● Morpheme Based Language Modeling ○ Reduces Out-of-Vocabulary (OOV) rate of Dev/Eval sets. ● Explore larger lexicon sizes using tweets 6. Conclusion ● A graphemic lexicon (CODA) outperformed the diacritized lexicon. ● Reducing OOV improves WER. ● Using DNN and SDNN acoustic models improves WER. ● Acoustic Modeling ○ Features: MFCCs + CMVN + LDA + MLLT + SAT. ○ Triphone context-dependant GMM-HMM. ○ Deep Neural Networks (DNN). ○ Sequence Deep Neural Networks (SDNN). 5. Evaluation Comparison between reference (REF) and hypothesis (HYP) generated by ASR. Word Error Rate (WER) = (Subs + Dels + Ins) / N ● Subs: Number of substitutions between REF and HYP. ● Dels: Number of deletions in HYP compared to REF. ● Ins: Number of insertions in HYP compared to REF. ● N: Total number of words in REF. wktb : w k t b AlkAtb : A l K A t b fy : f y ktAbh : k t A b h wktb : w a k a t a b a AlkAtb : A l K A t i b u fy : f i y ktAbh : k i t A b i h i ○ Toolkits: Kaldi Speech Recognition, MADAMIRA text-processing. ○ Original transcription parsed and normalized to generate CODA transcription. Acoustic Model Diacritized Lexicon Grapheme Lexicon (CODA) Grapheme Lexicon (Original) Dev (%) Eval (%) Dev (%) Eval (%) Dev (%) Eval (%) GMM-HMM 56.8 59.1 57.4 58.6 60.0 61.5 DNN 1024 x 4 54.8 57.4 55.0 58.0 58.8 61.2 SDNN 1024 x 4 53.1 56.1 53.0 55.6 57.0 59.8 Lexicon Size 17.8K 17.5K 18.7K OOV (%) 15.3 15.3 16.6 #Lexical Units 43 36 35