Merghani-SACNAS Poster

•

0 likes•76 views

This document presents research on developing an automatic speech recognition system for Egyptian Arabic. It explores using a diacritized lexicon versus an undiacritized graphemic lexicon, and compares different acoustic models. The researchers used the Al-Jazeera Egyptian dialect corpus and evaluated performance using word error rate. Their results showed that a graphemic lexicon outperformed a diacritized lexicon, and that reducing out-of-vocabulary words and using deep neural network acoustic models improved word error rate.

1. Overview 3. Methodology
4. Results
7. Further Work
Lexical Modeling of Egyptian Arabic for Automatic Speech Recognition
Taha Merghani1
, Tuka Al Hanai2
and James Glass2
1
Department of Electrical & Computer Engineering, Jackson State University
2
MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Developing an Automatic Speech Recognition system (ASR) for
Arabic is challenging.
● Ambiguous orthography
○ No Diacritics --information is missing.
○ Ambiguous word to phoneme mapping.
● Rich Morphology
○ Large number of combinations of stems+affixes.
● Diverse Dialects
○ Arabic widely spoken, 250+ million speakers.
○ No standardized orthography.
○ Limited data.
● Speech-enabled applications growing in everyday life
(Apple Siri, Car Navigation, Medical Transcription).
● 6000+ languages in the world.
Core of these systems is a recognizer:
2. MSRP Project
● Lexical Modelling
○ Diacritized lexicon with the aid of MADAMIRA, and Kaldi.
● Setup
○ Corpus: Al-Jazeera Egyptian Dialect.
○ 81K Tokens, 18K Vocab.
○ Training set: 10 hrs, 1hr development set,
and 1hr evaluation set.
‫ﻛﺗﺎﺑﮫ‬ ‫ﻓﻲ‬ ‫اﻟﻛﺎﺗب‬ ‫وﻛﺗب‬
wktb AlkAtb fy ktAbh
ndwrt thwrtr n hswrt
ِ‫ﮫ‬ِِِ‫ﺑ‬‫ِﺗﺎ‬‫ﻛ‬ ‫ِﻲ‬‫ﻓ‬ ُ‫ِب‬‫ﺗ‬‫َﺎ‬‫ﻛ‬‫اﻟ‬ َ‫َب‬‫ﺗ‬َ‫ﻛ‬‫و‬
waktaba AlkAtibu fiy kitAbihi
andwrote thewriter in hiswrit
MADAMIRA
● Language Modelling
○ Trigram with SRILM Kneser-Ney
discounting.
○ Built over training data transcript.
References
● Pasha et al, “MADAMIRA: A Fast, Comprehensive
Tool for Morphological Analysis and Disambiguation of
Arabic”, 2014
● F. Biadsy, N. Habash, and J. Hirschberg, “Improving
the Arabic pronunciation dictionary for phone and word
recognition with linguistically-based pronunciation
rules,”, 2009
Research Question: Does diacritization
of words improve ASR performance of
Egyptian Arabic?
Grapheme
Lexicon
Diacritized
Lexicon
‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬
Language
model
Lexicon
Acoustic
Model
Evaluate
(WER)
‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬
Decoder
● Train new acoustic models
○ Long Short-Term Memory Recursive Neural Networks
(LSTM-RNN) using Stacked Bottleneck features (SBN).
● Morpheme Based Language Modeling
○ Reduces Out-of-Vocabulary (OOV) rate of Dev/Eval sets.
● Explore larger lexicon sizes using tweets
6. Conclusion
● A graphemic lexicon (CODA) outperformed the
diacritized lexicon.
● Reducing OOV improves WER.
● Using DNN and SDNN acoustic models improves
WER.
● Acoustic Modeling
○ Features: MFCCs + CMVN + LDA + MLLT + SAT.
○ Triphone context-dependant GMM-HMM.
○ Deep Neural Networks (DNN).
○ Sequence Deep Neural Networks (SDNN).
5. Evaluation
Comparison between reference (REF) and hypothesis
(HYP) generated by ASR.
Word Error Rate (WER) = (Subs + Dels + Ins) / N
● Subs: Number of substitutions between REF and HYP.
● Dels: Number of deletions in HYP compared to REF.
● Ins: Number of insertions in HYP compared to REF.
● N: Total number of words in REF.
wktb : w k t b
AlkAtb : A l K A t b
fy : f y
ktAbh : k t A b h
wktb : w a k a t a b a
AlkAtb : A l K A t i b u
fy : f i y
ktAbh : k i t A b i h i
○ Toolkits: Kaldi Speech Recognition,
MADAMIRA text-processing.
○ Original transcription parsed and normalized
to generate CODA transcription.
Acoustic Model
Diacritized Lexicon Grapheme Lexicon (CODA) Grapheme Lexicon (Original)
Dev (%) Eval (%) Dev (%) Eval (%) Dev (%) Eval (%)
GMM-HMM 56.8 59.1 57.4 58.6 60.0 61.5
DNN 1024 x 4 54.8 57.4 55.0 58.0 58.8 61.2
SDNN 1024 x 4 53.1 56.1 53.0 55.6 57.0 59.8
Lexicon Size 17.8K 17.5K 18.7K
OOV (%) 15.3 15.3 16.6
#Lexical Units 43 36 35

This document contains a 5 question examination paper on formal languages and automata theory from Dronacharya College of Engineering, Gurgaon for 5th semester Computer Science Engineering students. The paper covers topics like computation, formal language design, differences between formal and natural languages, deterministic finite automata, non-deterministic finite automata, converting NFA to DFA, minimizing DFA using Myhill-Nerode theorem, and converting Mealy machines to Moore machines. Students have to answer all 5 questions carrying 10 marks each in 2 hours.

Closure properties of context free grammar

AfshanKhan51

Yves Peirsman - Deep Learning for NLP

Hendrik D'Oosterlinck

Term Rewriting applied to Cryptographic Protocol Analysis: the Maude-NPA tool

Facultad de Informática UCM

Multi-Target Machine Translation with Multi-Synchronous Context-free Grammars...

Shin Kanouchi

This document proposes multi-synchronous context-free grammars (MSCFGs) to perform multi-target machine translation. MSCFGs allow incorporating language models from multiple target languages to improve translation quality, especially when one target language has a stronger language model than the other. The approach learns MSCFG rules from aligned parallel data, then performs decoding to generate translations into multiple languages simultaneously while accounting for language model probabilities in all languages. Experimental results show BLEU score gains of up to 0.8-1.5 points over a single-target baseline.

Language models

Maryam Khordad

This document discusses natural language processing and language models. It begins by explaining that natural language processing aims to give computers the ability to process human language in order to perform tasks like dialogue systems, machine translation, and question answering. It then discusses how language models assign probabilities to strings of text to determine if they are valid sentences. Specifically, it covers n-gram models which use the previous n words to predict the next, and how smoothing techniques are used to handle uncommon words. The document provides an overview of key concepts in natural language processing and language modeling.

ICANN 50: IDN Root Zone LGR Generation Panels Workshop

ICANN

This document outlines a workshop at ICANN50 discussing the work of prospective generation panels for Chinese, Japanese, Korean, and other scripts. The agenda includes updates from each panel on developing label generation rules for their script and coordinating rules across scripts. The goal is to finalize code points, variants, whole label rules, and LGR documents for each script by ICANN51 to submit to ICANN for the IDN root zone.

Guidance, Please! Towards a Framework for RDF-based Constraint Languages.

Kai Eckert

Presentation held at the DCMI Conference 2015 in Sao Paulo. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386 In the context of the DCMI RDF Application Profile task group and the W3C Data Shapes Working Group solutions for the proper formulation of constraints and validation of RDF data on these constraints are being developed. Several approaches and constraint languages exist but there is no clear favorite and none of the languages is able to meet all requirements raised by data practitioners. To support the work, a comprehensive, community-driven database has been created where case studies, use cases, requirements and solutions are collected. Based on this database, we have hitherto published 81 types of constraints that are required by various stakeholders for data applications. We are using this collection of constraint types to gain a better understanding of the expressiveness of existing solutions and gaps that still need to be filled. Regarding the implementation of constraint languages, we have already proposed to use high-level languages to describe the constraints, but map them to SPARQL queries in order to execute the actual validation; we have demonstrated this approach for the Web Ontology Language in its current version 2 and Description Set Profiles. In this paper, we generalize from the experience of implementing OWL 2 and DSP by introducing an abstraction layer that is able to describe constraints of any constraint type in a way that mappings from high-level constraint languages to this intermediate representation can be created more or less straight-forwardly. We demonstrate that using another layer on top of SPARQL helps to implement validation consistently accross constraint languages, simplifies the actual implementation of new languages, and supports the transformation of semantically equivalent constraints across constraint languages.

These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification. Please find the links to the colab notebooks here: https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Saurabh Kaushik

This document discusses how deep learning techniques can be applied to natural language processing tasks. It begins by explaining some of the limitations of traditional rule-based and machine learning approaches to NLP, such as the lack of semantic understanding and difficulty of feature engineering. Deep learning approaches can learn features automatically from large amounts of unlabeled text and better capture semantic and syntactic relationships between words. Recurrent neural networks are well-suited for NLP because they can model sequential data like text, and convolutional neural networks can learn hierarchical patterns in text.

Esa act

Advanced-Concepts-Team

This document discusses using machine learning techniques like neural networks to help decipher ancient scripts and languages. It describes how character-level sequence-to-sequence models can be used to identify cognates between related languages. Additional techniques like network flows and dynamic programming are used to model monotonic character alignments and jointly segment and match tokens between known and unknown languages. The approaches are able to identify cognates between languages like Ugaritic and Hebrew as well as segment and match the unknown Iberian language. Neural models that incorporate linguistic features like phonological embeddings are shown to improve decipherment performance.

TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS

kevig

Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and commontool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.

Tuning Dari Speech Classification Employing Deep Neural Networks

kevig

Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Fwdays

Arabic mwe presentation 07

Mohammed Attia

This document describes three techniques for automatically extracting multiword expressions from Arabic texts: 1. Using crosslingual correspondences between Arabic Wikipedia titles and their translations in other languages, assuming MWEs are less likely to have one-to-one translations. 2. Translating nominal MWEs from Princeton WordNet into Arabic using Google Translate and validating using search engine frequency counts. 3. Applying association measures like PMI and chi-square to n-grams in the Arabic Gigaword corpus after lemmatization and POS filtering. The combination of techniques utilizing multilingual data, dictionaries and corpora enriched the extracted Arabic MWE lexicon with over 33,000 MWEs and 39,000

Build your own ASR engine

Korakot Chaovavanich

This document provides an overview of building an automatic speech recognition (ASR) engine. It discusses speech as a natural modality with high throughput that needs to account for errors in ASR output. It describes the components of a dialogue system including the ASR, natural language understanding, text-to-speech, and a dialogue manager. The document then discusses the components inside the recognizer including the acoustic model, language model, feature extraction using MFCC, and decoding using techniques like beam search. It also discusses topics like building the lexicon, acoustic modeling, and using deep learning approaches in ASR.

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2

Karthik Murugesan

This document provides an outline for a tutorial on deep learning for natural language processing. It begins with an introduction to deep learning and its history, then discusses how neural methods have become prominent in natural language processing. The rest of the tutorial is outlined covering deep semantic models for text, recurrent neural networks for text generation, neural question answering models, and deep reinforcement learning for dialog systems.

Arabic named entity recognition using deep learning approach

IJECEIAES

Most of the Arabic Named Entity Recognition (NER) systems depend massively on external resources and handmade feature engineering to achieve state-of-the-art results. To overcome such limitations, we proposed, in this paper, to use deep learning approach to tackle the Arabic NER task. We introduced a neural network architecture based on bidirectional Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) and experimented with various commonly used hyperparameters to assess their effect on the overall performance of our system. Our model gets two sources of information about words as input: pre-trained word embeddings and character-based representations and eliminated the need for any task-specific knowledge or feature engineering. We obtained state-of-the-art result on the standard ANERcorp corpus with an F1 score of 90.6%.

Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"

Lviv Startup Club

This document provides an overview of automatic speech recognition (ASR) technologies. It discusses the history and challenges of ASR, the general processing flow including speech representation techniques like MFCCs and acoustic and language modeling using HMMs and DNNs. It also introduces some open source ASR tools like CMU Sphinx and Kaldi and provides some ASR demonstration results on the TIMIT dataset using a BiLSTM model. The document concludes with questions for discussion.

Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...

Christophe Tricot

This document compares a knowledge-poor probabilistic approach and a knowledge-rich rule-based approach for multilingual terminology extraction. It describes the methodology and resources used for each approach. The knowledge-poor approach uses distributional clustering to induce part-of-speech tags and conditional random fields to extract candidate terms from small annotated corpora. The knowledge-rich approach relies on hand-crafted patterns and rules based on part-of-speech tags to identify terms and variants. Both approaches are evaluated on terminology extraction for six languages from comparable corpora in the domains of wind energy and mobile technologies, using reference term lists.

Automatic Personality Prediction with Attention-based Neural Networks

Jinho Choi

Distributional Semantic word representation allows Natural Language Processing systems to extract and model an immense amount of information about a language. This technique maps words into a high dimensional continuous space through the use of a single-layer neural network. This process has allowed for advances in many Natural Language Processing research areas and tasks. These representation models are evaluated with the use of analogy tests, questions of the form ``If a is to a' then b is to what?'' are answered by composing multiple word vectors and searching the vector space. During the neural network training process, each word is examined as a member of its context. Generally, a word's context is considered to be the elements adjacent to it within a sentence. While some work has been conducted examining the effect of expanding this definition, very little exploration has been done in this area. Further, no inquiry has been conducted as to the specific linguistic competencies of these models or whether modifying their contexts impacts the information they extract. In this paper we propose a thorough analysis of the various lexical and grammatical competencies of distributional semantic models. We aim to leverage analogy tests to evaluate the most advanced distributional model across 14 different types of linguistic relationships. With this information we will then be able to investigate as to whether modifying the training context renders any differences in quality across any of these categories. Ideally we will be able to identify approaches to training that increase precision in some specific linguistic categories, which will allow us to investigate whether these improvements can be combined by joining the information used in different training approaches to build a single, improved, model.

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...

ijnlc

Researchers of many nations have developed automatic speech recognition (ASR) to show their national improvement in information and communication technology for their languages. This work intends to improve the ASR performance for Myanmar language by changing different Convolutional Neural Network (CNN) hyperparameters such as number of feature maps and pooling size. CNN has the abilities of reducing in spectral variations and modeling spectral correlations that exist in the signal due to the locality and pooling operation. Therefore, the impact of the hyperparameters on CNN accuracy in ASR tasks is investigated. A 42-hr-data set is used as training data and the ASR performance was evaluated on two open test sets: web news and recorded data. As Myanmar language is a syllable-timed language, ASR based on syllable was built and compared with ASR based on word. As the result, it gained 16.7% word error rate (WER) and 11.5% syllable error rate (SER) on TestSet1. And it also achieved 21.83% WER and 15.76% SER on TestSet2.

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...

kevig

Wreck a nice beach: adventures in speech recognition

Stephen Marquard

ATAR: Attention-based LSTM for Arabizi transliteration

IJECEIAES

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present ATAR, an ATtention-based LSTM model for ARabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).

Lenar Gabdrakhmanov (Provectus): Speech synthesis

Provectus

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Jayavardhan Reddy Peddamail

This document summarizes a tutorial for developing a state-of-the-art named entity recognition framework using deep learning. The tutorial uses a bi-directional LSTM-CNN architecture with a CRF layer, as presented in a 2016 paper. It replicates the paper's results on the CoNLL 2003 dataset for NER, achieving an F1 score of 91.21. The tutorial covers data preparation from the dataset, word embeddings using GloVe vectors, a CNN encoder for character-level representations, a bi-LSTM for word-level encoding, and a CRF layer for output decoding and sequence tagging. The experience of presenting this tutorial to friends highlighted the need for detailed comments and explanations of each step and PyTorch functions.

Trends of ICASSP 2022

Kwanghee Choi

The document summarizes key topics from ICASSP 2022, including general trends in speech and audio processing, self-supervised and contrastive learning approaches, security applications, and topics related to tasks like multilingualism and keyword spotting. Some of the main models and techniques discussed are Wav2vec, HuBERT, contrastive learning using Conformers, intermediate layer supervision in self-supervised learning, and anonymization of speech data for privacy.

Similar to Merghani-SACNAS Poster

Natural language processing for requirements engineering: ICSE 2021 Technical...

alessio_ferrari

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Saurabh Kaushik

Esa act

Advanced-Concepts-Team

TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS

kevig

Tuning Dari Speech Classification Employing Deep Neural Networks

kevig

Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Fwdays

Arabic mwe presentation 07

Mohammed Attia

Build your own ASR engine

Korakot Chaovavanich

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2

Karthik Murugesan

Arabic named entity recognition using deep learning approach

IJECEIAES

Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"

Lviv Startup Club

Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...

Christophe Tricot

Automatic Personality Prediction with Attention-based Neural Networks

Jinho Choi

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...

ijnlc

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...

kevig

Wreck a nice beach: adventures in speech recognition

Stephen Marquard

ATAR: Attention-based LSTM for Arabizi transliteration

IJECEIAES

Lenar Gabdrakhmanov (Provectus): Speech synthesis

Provectus

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Jayavardhan Reddy Peddamail

Trends of ICASSP 2022

Kwanghee Choi

Merghani-SACNAS Poster

1. 1. Overview 3. Methodology 4. Results 7. Further Work Lexical Modeling of Egyptian Arabic for Automatic Speech Recognition Taha Merghani1 , Tuka Al Hanai2 and James Glass2 1 Department of Electrical & Computer Engineering, Jackson State University 2 MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology Developing an Automatic Speech Recognition system (ASR) for Arabic is challenging. ● Ambiguous orthography ○ No Diacritics --information is missing. ○ Ambiguous word to phoneme mapping. ● Rich Morphology ○ Large number of combinations of stems+affixes. ● Diverse Dialects ○ Arabic widely spoken, 250+ million speakers. ○ No standardized orthography. ○ Limited data. ● Speech-enabled applications growing in everyday life (Apple Siri, Car Navigation, Medical Transcription). ● 6000+ languages in the world. Core of these systems is a recognizer: 2. MSRP Project ● Lexical Modelling ○ Diacritized lexicon with the aid of MADAMIRA, and Kaldi. ● Setup ○ Corpus: Al-Jazeera Egyptian Dialect. ○ 81K Tokens, 18K Vocab. ○ Training set: 10 hrs, 1hr development set, and 1hr evaluation set. ‫ﻛﺗﺎﺑﮫ‬ ‫ﻓﻲ‬ ‫اﻟﻛﺎﺗب‬ ‫وﻛﺗب‬ wktb AlkAtb fy ktAbh ndwrt thwrtr n hswrt ِ‫ﮫ‬ِِِ‫ﺑ‬‫ِﺗﺎ‬‫ﻛ‬ ‫ِﻲ‬‫ﻓ‬ ُ‫ِب‬‫ﺗ‬‫َﺎ‬‫ﻛ‬‫اﻟ‬ َ‫َب‬‫ﺗ‬َ‫ﻛ‬‫و‬ waktaba AlkAtibu fiy kitAbihi andwrote thewriter in hiswrit MADAMIRA ● Language Modelling ○ Trigram with SRILM Kneser-Ney discounting. ○ Built over training data transcript. References ● Pasha et al, “MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic”, 2014 ● F. Biadsy, N. Habash, and J. Hirschberg, “Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules,”, 2009 Research Question: Does diacritization of words improve ASR performance of Egyptian Arabic? Grapheme Lexicon Diacritized Lexicon ‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬ Language model Lexicon Acoustic Model Evaluate (WER) ‫؟‬ ‫أﻧت‬ ‫ﻣﯾن‬ Decoder ● Train new acoustic models ○ Long Short-Term Memory Recursive Neural Networks (LSTM-RNN) using Stacked Bottleneck features (SBN). ● Morpheme Based Language Modeling ○ Reduces Out-of-Vocabulary (OOV) rate of Dev/Eval sets. ● Explore larger lexicon sizes using tweets 6. Conclusion ● A graphemic lexicon (CODA) outperformed the diacritized lexicon. ● Reducing OOV improves WER. ● Using DNN and SDNN acoustic models improves WER. ● Acoustic Modeling ○ Features: MFCCs + CMVN + LDA + MLLT + SAT. ○ Triphone context-dependant GMM-HMM. ○ Deep Neural Networks (DNN). ○ Sequence Deep Neural Networks (SDNN). 5. Evaluation Comparison between reference (REF) and hypothesis (HYP) generated by ASR. Word Error Rate (WER) = (Subs + Dels + Ins) / N ● Subs: Number of substitutions between REF and HYP. ● Dels: Number of deletions in HYP compared to REF. ● Ins: Number of insertions in HYP compared to REF. ● N: Total number of words in REF. wktb : w k t b AlkAtb : A l K A t b fy : f y ktAbh : k t A b h wktb : w a k a t a b a AlkAtb : A l K A t i b u fy : f i y ktAbh : k i t A b i h i ○ Toolkits: Kaldi Speech Recognition, MADAMIRA text-processing. ○ Original transcription parsed and normalized to generate CODA transcription. Acoustic Model Diacritized Lexicon Grapheme Lexicon (CODA) Grapheme Lexicon (Original) Dev (%) Eval (%) Dev (%) Eval (%) Dev (%) Eval (%) GMM-HMM 56.8 59.1 57.4 58.6 60.0 61.5 DNN 1024 x 4 54.8 57.4 55.0 58.0 58.8 61.2 SDNN 1024 x 4 53.1 56.1 53.0 55.6 57.0 59.8 Lexicon Size 17.8K 17.5K 18.7K OOV (%) 15.3 15.3 16.6 #Lexical Units 43 36 35

Merghani-SACNAS Poster

Recommended

Recommended

More Related Content

Similar to Merghani-SACNAS Poster

Similar to Merghani-SACNAS Poster (20)

Merghani-SACNAS Poster