Automatic Speech Recognition (ASR) is a thriving technology, which permits an application to detect the spoken words or the speaker using computer. This work isfocused on speech recognition for children with Asperger’s Syndrome who belongs to the age group of five to ten. Asperger’s Syndrome (AS) is one of the spectrum disorders in Autism Spectrum Disorder (ASD) in which most of affected individuals have high levelof IQ and sound language skills. But thespectrum disorder restricts for communication and social interaction. The implementation of this workaccomplished through two different stages. In the first stage, an attempt is made to develop an Automatic Speech Recognition system, to identify the articulation disorder inchildren with Aspergers’ Syndrome (AS) on the basis of 14 Malayalam vowels. A deep autoencoder is used for pre-training in an unsupervised manner with 27,300 features that includes MelFrequency Cepstral Coefficients (MFCC), Zero Crossing, Formants 1, Formants 2 and Formants 3. An additional layer is added and fine-tuned to predict the target vowel sounds, which enhance the performance of the auto–encoder. In the second stage, Learning Assistive System for Autistic Child (LASAC)is envisioned to provide speech training as well as impaired speech analysis for Malayalam vowels. A feature vector is used to create the acoustic model and the Euclidean Distance formula is used to analyze the percentage of impairment in AS student’s speech by comparing their speech features against the acoustic model.LASAC provides an interactive interface that offers speech training, speech analysis and articulation tutorial. Most of the autistic students have difficulty to memorize words or letters but they are able to recollect pictures. This serves as a learning assistive system for Malayalam letters with their corresponding images and words. The DNN with auto-encoder has achieved an average accuracy of 65% with the impaired (autistic) dataset.
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
Speech technology is an emerging technology and automatic speech recognition has made advances in recent years. Many researches has been performed for many foreign and regional languages. But at present the multilingual speech processing technology has been attracting for research purpose. This paper tries to propose a methodology for developing a bilingual speech identification system for Assamese and English language based on artificial neural network.
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...sipij
Speech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,
the pathological cases of speech disabled children affected with AOS are analyzed. The speech signal
samples of children of age between three to eight years are considered for the present study. These speech
signals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysis
This analysis is conducted on speech data samples which are concerned with both place of articulation and
manner of articulation. The speech disability of pathological subjects was estimated using results of above
analysis.
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data.
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET Journal
The document describes a proposed system to translate Tamil speech to Indian Sign Language (ISL) using speech recognition and natural language processing algorithms. It aims to help hearing-impaired people communicate independently. The system would use the CMU Sphinx speech recognition tool to convert spoken Tamil to text, then apply grammar rules and machine learning to translate the text to ISL displayed through video or animated avatars. The document reviews similar existing systems and research on speech recognition and sign language translation to inform the design and implementation of the proposed Tamil-ISL system.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
This document describes the development of the first manually annotated named entity corpus for the Myanmar language. It contains approximately 170,000 named entities tagged with types like person, location, organization, race, time and number. The document also discusses experiments using various deep neural network architectures for named entity recognition on Myanmar text, without additional feature engineering. Results showed that syllable-based neural models outperformed the baseline conditional random field model. This research aims to apply neural networks to Myanmar natural language processing and promote future work on this under-resourced language.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
This document summarizes and reviews various grammar checkers for natural languages. It begins by defining key concepts in natural language processing like computational linguistics and grammar checking. It then describes the general working of grammar checkers, which involves preprocessing text, analyzing morphology and syntax, and identifying grammatical errors. The document surveys grammar checking approaches for several languages like rule-based, statistical, and hybrid methods. Specific grammar checkers are discussed for languages like Afan Oromo, Amharic, Swedish, Icelandic, Nepali, and Portuguese. The review concludes by analyzing the features and limitations of existing grammar checking systems.
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...ravi sharma
This paper explores syllable approach to building language independent text to speech systems for Indian Languages. The use of common phone set, common question set and borrowing context-independent monophone models along with syllable approach across languages makes the procedure easier and less time-consuming, without compromising the synthesized speech quality. Systems can be built without even knowing the language. This is especially quite beneficial in the Indian scenario.
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
Speech technology is an emerging technology and automatic speech recognition has made advances in recent years. Many researches has been performed for many foreign and regional languages. But at present the multilingual speech processing technology has been attracting for research purpose. This paper tries to propose a methodology for developing a bilingual speech identification system for Assamese and English language based on artificial neural network.
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...sipij
Speech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,
the pathological cases of speech disabled children affected with AOS are analyzed. The speech signal
samples of children of age between three to eight years are considered for the present study. These speech
signals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysis
This analysis is conducted on speech data samples which are concerned with both place of articulation and
manner of articulation. The speech disability of pathological subjects was estimated using results of above
analysis.
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data.
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET Journal
The document describes a proposed system to translate Tamil speech to Indian Sign Language (ISL) using speech recognition and natural language processing algorithms. It aims to help hearing-impaired people communicate independently. The system would use the CMU Sphinx speech recognition tool to convert spoken Tamil to text, then apply grammar rules and machine learning to translate the text to ISL displayed through video or animated avatars. The document reviews similar existing systems and research on speech recognition and sign language translation to inform the design and implementation of the proposed Tamil-ISL system.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
This document describes the development of the first manually annotated named entity corpus for the Myanmar language. It contains approximately 170,000 named entities tagged with types like person, location, organization, race, time and number. The document also discusses experiments using various deep neural network architectures for named entity recognition on Myanmar text, without additional feature engineering. Results showed that syllable-based neural models outperformed the baseline conditional random field model. This research aims to apply neural networks to Myanmar natural language processing and promote future work on this under-resourced language.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
This document summarizes and reviews various grammar checkers for natural languages. It begins by defining key concepts in natural language processing like computational linguistics and grammar checking. It then describes the general working of grammar checkers, which involves preprocessing text, analyzing morphology and syntax, and identifying grammatical errors. The document surveys grammar checking approaches for several languages like rule-based, statistical, and hybrid methods. Specific grammar checkers are discussed for languages like Afan Oromo, Amharic, Swedish, Icelandic, Nepali, and Portuguese. The review concludes by analyzing the features and limitations of existing grammar checking systems.
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...ravi sharma
This paper explores syllable approach to building language independent text to speech systems for Indian Languages. The use of common phone set, common question set and borrowing context-independent monophone models along with syllable approach across languages makes the procedure easier and less time-consuming, without compromising the synthesized speech quality. Systems can be built without even knowing the language. This is especially quite beneficial in the Indian scenario.
Language engineering applies knowledge of human language to develop computer systems that can understand, interpret, and generate human language. It uses techniques implemented in software and language resources stored in repositories. Key areas of language engineering include speech recognition, natural language processing, language translation, and more. It relies on resources like lexicons, grammars, and corpora to analyze different aspects of language.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
This document describes a Marathi text-to-speech (TTS) synthesis system based on a concatenative approach using syllables as the basic speech units. The system analyzes input text, performs syllabification based on linguistic rules, retrieves corresponding speech files from a corpus, concatenates the files while minimizing discontinuities at boundaries, and outputs synthesized speech. A Marathi speech corpus was created containing over 1000 sentences from various domains. Subjective quality tests found the synthesized speech to have naturalness and intelligibility comparable to natural speech. The system demonstrates an effective approach for Marathi TTS using a syllable-based concatenative method.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
The document provides an overview of psycholinguistics, which is an interdisciplinary field that brings together linguistics and psychology to understand how humans acquire, process, and use language. It discusses several topics within psycholinguistics including language acquisition, comprehension, production, the relationship between language and the brain, aphasia, individual differences that influence language learning, information processing approaches, theories of connectionism, complexity theory, processability theory, age differences in second language acquisition, the critical period hypothesis, the role of motivation and sex in language learning, and aptitude for language learning.
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
This document presents a spoken language identification system that distinguishes between Tamil and Telugu languages using Mel-Frequency Cepstral Coefficient (MFCC) features extracted from speech signals and Gaussian Mixture Models (GMM) for language modeling. The system is trained on a dataset of speech samples in Tamil and Telugu from the Microsoft Speech Corpus for Indian Languages. MFCC features represent the acoustic properties of speech sounds and have been shown to provide good performance for language identification. GMM is used to model the probability distributions of MFCC features for each language. The proposed system aims to identify the language of unknown speech samples as either Tamil or Telugu.
33 9765 development paper id 0034 (edit a) (1)IAESIJEECS
Speech disfluency such as filled pause (FP) is a hindrance in Automated Speech Recognition as it degrades the accuracy performance. Previous work of FP detection and classification have fused a number of acoustical features as fusion classification is known to improve classification results. This paper presents new decision fusion of two well-established acoustical features that are zero crossing rates (ZCR) and speech envelope (ENV) with eight popular acoustical features for classification of Malay language filled pause (FP) and elongation (ELO). Five hundred ELO and 500 FP are selected from a spontaneous speeches of a parliamentary session and Naïve Bayes classifier is used for the decision fusion classification. The proposed feature fusion produced better classification performance compared to single feature classification with the highest F-measure of 82% for both classes.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
This paper presents a Marathi database and isolated
Word recognition system based on Mel-frequency cepstral
coefficient (MFCC), and Distance Time Warping (DTW) as
features. For the extraction of the feature, Marathi speech
database has been designed by using the Computerized Speech
Lab. The database consists of the Marathi vowels, isolated
words starting with each vowels and simple Marathi sentences.
Each word has been repeated three times by the 35 speakers.
This paper presents the comparative recognition accuracy of
DTW and MFCC.
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like- speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (north-west Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models).
Development of text to speech system for yoruba languageAlexander Decker
This document describes the development of a text-to-speech (TTS) system for the Yoruba language. It begins with background on TTS systems and an overview of previous work developing TTS for other languages but not extensively for Yoruba. The authors then describe the architecture and design of the Yoruba TTS system they developed using a concatenative synthesis method. This includes analyzing the phonology and syllable structure of Yoruba, and developing components for syllable identification, prosody assignment, and speech signal processing. An evaluation of the system found 70% of respondents found it usable.
Anaphora Resolution is a process of finding referents in discourse. In computational linguistic, Anaphora
resolution is complex and challenging task. This paper focuses on pronominal anaphora resolution. It is a
subpart of anaphora resolution where pronouns are referred to noun referents. Including anaphora
resolution into many applications like automatic summarization, opinion mining, machine translation,
question answering systems etc. increase their accuracy by 10%. Related work in this field has been done
in many languages. This paper focuses on resolving anaphora for Punjabi language. A model is proposed
for resolving anaphora and an experiment is conducted to measure the accuracy of the system. The model
uses two factors: Recency and Animistic knowledge. Recency factor works on the concept of Lappin Leass
approach and for introducing animistic knowledge gazetteer method is used. The experiment is conducted
on a Punjabi story containing more than 1000 words and result is drawn with the future directions.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
Louise Stringer and Paul Iverson from UCL investigated how accent influences word recognition and electrophysiological measures of speech processing for native English and Spanish listeners. They found that a regional Scottish accent and non-native Spanish accent showed some influence on early phonological and lexical processing even in quiet conditions. More intelligible accents in noise elicited larger brain responses, suggesting processing difficulties with accented speech occur even without noise. Accents may affect listeners' expectations about upcoming words.
Speech is the important mode of communication and is the current research
topic. The concentration is mostly focused on synthesis and analyzing part.Apart of
synthesizing, text to speech system is developed.Speech synthesis is an artificial production
of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In
India different languages have been spoken each being the mother tongue of tens of millions
of people.In this paper,the text to speech system is primarily developed for Telugu, a
Dravidian language predominantly spoken in Indian state of Andhra Pradesh.The
important qualities expected from this system are naturalness and intelligibility.Telugu TTS
can be developed using other synthesis methods like articulatory synthesis,formant synthesis
and concatenative synthesis.This paper describes a development of a Telugu text to speech
system using concatenative synthesis method on mobile based system OMAP 3530 (ARM
Cortex A-8 core) in Linux.
The document summarizes a study that tested a novel intervention called Auditory-Motor Mapping Training (AMMT) to facilitate speech output in non-verbal children with autism. Six non-verbal children between ages 5-9 participated in 40 individual AMMT sessions over 8 weeks. AMMT combines intonation and drum tapping to train associations between sounds and articulations. After therapy, all children showed significant improvements in articulating words and phrases, including untrained items, representing an important step in expressive language development.
American Standard Sign Language Representation Using Speech Recognitionpaperpublications3
Abstract: For many deaf people, sign language is the principle means of communication. This increases the isolation of hearing impaired people. This paper presents a system prototype that is able to automatically recognize speech which helps to communicate more effectively with the hearing or speech impaired people. This system recognizes speech signal . Recognized spoken words are represented using American standard sign language via a robotic arm and also on the computer using visual basic .In this project a software package is provided to convert the speech signal, (which does not have any meaning for the deaf and the dumb) into the sign language. The main purpose of this project is to bridge the communication and expression gap between the normal people who cannot understand the sign language, and the deaf and dumb who cannot understand the normal speech.
Language engineering applies knowledge of human language to develop computer systems that can understand, interpret, and generate human language. It uses techniques implemented in software and language resources stored in repositories. Key areas of language engineering include speech recognition, natural language processing, language translation, and more. It relies on resources like lexicons, grammars, and corpora to analyze different aspects of language.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
This document describes a Marathi text-to-speech (TTS) synthesis system based on a concatenative approach using syllables as the basic speech units. The system analyzes input text, performs syllabification based on linguistic rules, retrieves corresponding speech files from a corpus, concatenates the files while minimizing discontinuities at boundaries, and outputs synthesized speech. A Marathi speech corpus was created containing over 1000 sentences from various domains. Subjective quality tests found the synthesized speech to have naturalness and intelligibility comparable to natural speech. The system demonstrates an effective approach for Marathi TTS using a syllable-based concatenative method.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
The document provides an overview of psycholinguistics, which is an interdisciplinary field that brings together linguistics and psychology to understand how humans acquire, process, and use language. It discusses several topics within psycholinguistics including language acquisition, comprehension, production, the relationship between language and the brain, aphasia, individual differences that influence language learning, information processing approaches, theories of connectionism, complexity theory, processability theory, age differences in second language acquisition, the critical period hypothesis, the role of motivation and sex in language learning, and aptitude for language learning.
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
This document presents a spoken language identification system that distinguishes between Tamil and Telugu languages using Mel-Frequency Cepstral Coefficient (MFCC) features extracted from speech signals and Gaussian Mixture Models (GMM) for language modeling. The system is trained on a dataset of speech samples in Tamil and Telugu from the Microsoft Speech Corpus for Indian Languages. MFCC features represent the acoustic properties of speech sounds and have been shown to provide good performance for language identification. GMM is used to model the probability distributions of MFCC features for each language. The proposed system aims to identify the language of unknown speech samples as either Tamil or Telugu.
33 9765 development paper id 0034 (edit a) (1)IAESIJEECS
Speech disfluency such as filled pause (FP) is a hindrance in Automated Speech Recognition as it degrades the accuracy performance. Previous work of FP detection and classification have fused a number of acoustical features as fusion classification is known to improve classification results. This paper presents new decision fusion of two well-established acoustical features that are zero crossing rates (ZCR) and speech envelope (ENV) with eight popular acoustical features for classification of Malay language filled pause (FP) and elongation (ELO). Five hundred ELO and 500 FP are selected from a spontaneous speeches of a parliamentary session and Naïve Bayes classifier is used for the decision fusion classification. The proposed feature fusion produced better classification performance compared to single feature classification with the highest F-measure of 82% for both classes.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
This paper presents a Marathi database and isolated
Word recognition system based on Mel-frequency cepstral
coefficient (MFCC), and Distance Time Warping (DTW) as
features. For the extraction of the feature, Marathi speech
database has been designed by using the Computerized Speech
Lab. The database consists of the Marathi vowels, isolated
words starting with each vowels and simple Marathi sentences.
Each word has been repeated three times by the 35 speakers.
This paper presents the comparative recognition accuracy of
DTW and MFCC.
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like- speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (north-west Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models).
Development of text to speech system for yoruba languageAlexander Decker
This document describes the development of a text-to-speech (TTS) system for the Yoruba language. It begins with background on TTS systems and an overview of previous work developing TTS for other languages but not extensively for Yoruba. The authors then describe the architecture and design of the Yoruba TTS system they developed using a concatenative synthesis method. This includes analyzing the phonology and syllable structure of Yoruba, and developing components for syllable identification, prosody assignment, and speech signal processing. An evaluation of the system found 70% of respondents found it usable.
Anaphora Resolution is a process of finding referents in discourse. In computational linguistic, Anaphora
resolution is complex and challenging task. This paper focuses on pronominal anaphora resolution. It is a
subpart of anaphora resolution where pronouns are referred to noun referents. Including anaphora
resolution into many applications like automatic summarization, opinion mining, machine translation,
question answering systems etc. increase their accuracy by 10%. Related work in this field has been done
in many languages. This paper focuses on resolving anaphora for Punjabi language. A model is proposed
for resolving anaphora and an experiment is conducted to measure the accuracy of the system. The model
uses two factors: Recency and Animistic knowledge. Recency factor works on the concept of Lappin Leass
approach and for introducing animistic knowledge gazetteer method is used. The experiment is conducted
on a Punjabi story containing more than 1000 words and result is drawn with the future directions.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
Louise Stringer and Paul Iverson from UCL investigated how accent influences word recognition and electrophysiological measures of speech processing for native English and Spanish listeners. They found that a regional Scottish accent and non-native Spanish accent showed some influence on early phonological and lexical processing even in quiet conditions. More intelligible accents in noise elicited larger brain responses, suggesting processing difficulties with accented speech occur even without noise. Accents may affect listeners' expectations about upcoming words.
Speech is the important mode of communication and is the current research
topic. The concentration is mostly focused on synthesis and analyzing part.Apart of
synthesizing, text to speech system is developed.Speech synthesis is an artificial production
of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In
India different languages have been spoken each being the mother tongue of tens of millions
of people.In this paper,the text to speech system is primarily developed for Telugu, a
Dravidian language predominantly spoken in Indian state of Andhra Pradesh.The
important qualities expected from this system are naturalness and intelligibility.Telugu TTS
can be developed using other synthesis methods like articulatory synthesis,formant synthesis
and concatenative synthesis.This paper describes a development of a Telugu text to speech
system using concatenative synthesis method on mobile based system OMAP 3530 (ARM
Cortex A-8 core) in Linux.
The document summarizes a study that tested a novel intervention called Auditory-Motor Mapping Training (AMMT) to facilitate speech output in non-verbal children with autism. Six non-verbal children between ages 5-9 participated in 40 individual AMMT sessions over 8 weeks. AMMT combines intonation and drum tapping to train associations between sounds and articulations. After therapy, all children showed significant improvements in articulating words and phrases, including untrained items, representing an important step in expressive language development.
American Standard Sign Language Representation Using Speech Recognitionpaperpublications3
Abstract: For many deaf people, sign language is the principle means of communication. This increases the isolation of hearing impaired people. This paper presents a system prototype that is able to automatically recognize speech which helps to communicate more effectively with the hearing or speech impaired people. This system recognizes speech signal . Recognized spoken words are represented using American standard sign language via a robotic arm and also on the computer using visual basic .In this project a software package is provided to convert the speech signal, (which does not have any meaning for the deaf and the dumb) into the sign language. The main purpose of this project is to bridge the communication and expression gap between the normal people who cannot understand the sign language, and the deaf and dumb who cannot understand the normal speech.
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
مجلة الأهواز لدراسات علم اللغة
(مجلة فصلية دولية محكمة)
(ISSN: 2717-2716)
لمزید من المعلومات، ﯾرﺟﯽ زﯾﺎرة ﻣوﻗﻌﻧﺎ اﻹﻟﮐﺗروﻧﻲ : WWW.AJLS.IR
ترحب المجلة بجميع الباحثين في مجال اهتمامها العلمي والبحثي في احد المحاور المذکورة أدﻧﺎه بإحدی اللغات التالیة: العربیة، الإنجلیزیة و الفارسیة:
أ) اللغات و اللهجات (القضايا الراهنة بلسانیات اللغة)
ب) علم اللغة (القضايا الراهنة بعلم اللغة)
ج) الأدب (القضاية الراهنة بالأدب العربي، الإنجليزي، و سائر اللغات)
د) الترجمة (القضاية الراهنة بترجمة اللغات)
ه) القضايا الراهنة بلسانیات القرآن الکریم
و) القضايا الراهنة لتعلیم اللغات لغير الناطقين بها
ز) تعليم، برمجة و تقييم برامج تعليم و تعلم اللغات
ح) الاستراتيجيات، إمكانیات و تحديات التسويق وريادة الأعمال فی اللغات المتنوعة
ط) القضايا الراهنة بلسانیات النصوص و الخطاب الديني، الاقتصادی، الاجتماعي، القانوني، و ...
الأهواز / الصندوق البريدی 61335-4619:
الهاتف :32931199-61 (98+)
الفاکس:32931198-61(98+)
النقال و رقم للتواصل علی الواتس اب : 9165088772(98+)
البريد اﻹﻟﮑﺘﺮوﻧﻲ: info@pahi.ir
Vol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
Ahwaz Journal of Linguistics Studies
(Peer-Reviewed International Quarterly Journal)
(ISSN: 2717-2643)
For more information, please visit the journal website:
WWW.AJLS.IR
The journal welcomes submissions in English, Arabic or Persian in any of the relevant fields:
A) Linguistics (Any issue related to either theoretical or applied linguistics)
B) Languages and dialects (Any linguistic issue related languages and dialects)
C) Translation (Any translation and interpreting issue related to languages and dialects)
D) Religious linguistics (Any linguistic study related to religious texts and speeches)
Please feel free to write if there is any query.
The AJLS Secretariat,
Ahwaz 61335-4619 Iran
Tel: (+98) 61-32931199
Fax: (+98) 61-32931198
Mobile: (+98) 916-5088772 (WhatsApp Number)
Email: info@pahi.ir
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesEditor IJCATR
Speech recognition applications are becoming common and useful in this day and age as many of the modern devices are designed and produced user-friendly for the convenience of general public. Speaking/communicating directly with the machine to achieve desired objectives make usage of modern devices easier and convenient. Although may interactive software applications are available, the use of these applications is limited due to language barriers. Hence development of speech recognition systems in local languages will help anyone to make use of this technological advancement. In this paper we discuss the results of the Nasal phonetic class wise speech recognition performance of Malayalam language
This an overview of ELSA's proprietary speech recognition technology. Find out how voice technology can help with language learning, and what differentiates ELSA's model from the others.
Deaf Speech Assessment Using Digital Processing Techniquessipij
The document analyzes deaf speech to increase speech recognition rates. It discusses analyzing pitch (fundamental frequency) and resonant frequencies (formants) of deaf speech, which exhibit unusual characteristics compared to normal speech. The document presents a pitch detection algorithm using subharmonic-to-harmonic ratio to estimate pitch in deaf speech. Formants are also extracted from deaf speech using linear predictive coding. Results show greater variation in pitch and formants for deaf speakers compared to normal speakers, making these features unsuitable for deaf speech recognition but possible for speaker classification.
Estimation of Severity of Speech Disability Through Speech Envelopesipij
In this paper, envelope detection of speech is discussed to distinguish the pathological cases of speech disabled children. The speech signal samples of children of age between five to eight years are considered for the present study. These speech signals are digitized and are used to determine the speech envelope. The envelope is subjected to ratio mean analysis to estimate the disability. This analysis is conducted on ten speech signal samples which are related to both place of articulation and manner of articulation. Overall speech disability of a pathological subject is estimated based on the results of above analysis.
Difference Between Alphabet And International Phonetic TheorySandy Harwell
The document discusses the differences between phonetics and phonology. Phonetics is the study of human sounds and how they are physically produced, transmitted, and perceived. Phonology studies how sounds function and relate to each other in the sound system of a language. It focuses on abstract concepts like phonemes and allophones. While phonetics looks at actual sound production, phonology examines the underlying patterns of a language. The document provides examples of phonological concepts like distinctive sounds, phonemes that distinguish word meanings, and tone in tonal languages to illustrate the differences between the two fields of study.
Paper Design and Simulation of Communication Aid For Disabled based on region...Syambabu Choda
This document describes the design and simulation of a communication aid for disabled people based on the Telugu regional language. It discusses how sign language is an important form of communication for disabled individuals like the deaf, dumb, and blind. The proposed system uses image processing and artificial neural networks to capture Telugu sign language gestures and convert them into text that can be displayed or spoken aloud. This allows disabled individuals to communicate both with each other using regional sign language and with non-signers by translating gestures into a readable or audible format in real-time. The system aims to help overcome communication barriers faced by the disabled.
The document discusses several apps to help with social skills, sentence structure, speech, and narrative skills. Conversation Builder and Social Adventures help children learn conversation skills and social behaviors. Social Skill Builder uses videos to teach social skills to children with autism or language deficits. Rainbow Sentences and Tense Builder use visual cues and videos to teach sentence structure and verb tenses. Propositions Journey, Auditory Memory Ride, and Language Builder address speech skills like prepositions, auditory memory, and vocabulary. The Surprise, Speech Journal, and Story Pals can be used to develop narrative skills through storytelling, writing, and comprehension activities.
The purpose of this study was to explore the impact of text-to-speech technology on reading comprehension for typical individuals. Four conditions were tested: reading without text-to-speech, with text-to-speech, with text-to-speech and highlighting, and with text-to-speech at a reduced rate. The results showed that the conditions with the highest average performance were text-to-speech and text-to-speech with highlighting. Text-to-speech at a reduced rate had the lowest average performance. Future research could explore the impact of these conditions on reading comprehension for people with aphasia.
This document discusses several apps that target social skills, sentence structure, speech, and narrative skills. Conversation Builder and Social Adventures help children learn conversation skills and social behaviors. Social Skill Builder uses videos to promote social learning for those with autism or other deficits. Rainbow Sentences, Tense Builder, and Sentence Ninja target sentence structure skills using visual cues and games. Propositions Journey, Auditory Memory Ride, and Language Builder address speech skills like prepositions and auditory memory. The Surprise, Speech Journal, and Story Pals provide opportunities for narrative development through storytelling, writing, and comprehension activities.
This document discusses diagnosis and assessment of apraxia of speech. It describes two quantitative approaches for diagnosing the severity of apraxia of speech that were examined in a study. The study evaluated 59 participants, 39 with aphasia, using a motor speech evaluation and audio recordings. Scores from clinicians showed high reliability and validity based on correlations with independent clinical diagnosis and judgements of severity. The Apraxia of Speech Rating Scale was also developed as a descriptive rather than diagnostic tool but was found to also be useful for differential diagnosis.
Protocolo de tratamento motor para o desenvolvimento motor da falaPaula Lins
This study examined the effects of a Motor Speech Treatment Protocol (MSTP) on five children ages 3;2 to 3;5 with severe to profound speech sound disorders and motor speech difficulties. The MSTP integrated concepts from PROMPT and DTTC motor speech therapies and principles of motor learning. Children received 45-minute individual therapy sessions twice per week for 10 weeks. Outcome measures included visual and auditory ratings of speech accuracy on probe words, as well as standardized speech intelligibility assessments. Results showed significant improvements in speech accuracy and motor control for all children except one, indicating the MSTP may positively impact speech production for children with developmental motor speech disorders.
Augmentative Communication and Literacy chapter 14Sharon Jones
This chapter examines augmentative and alternative communication (AAC) interventions for individuals with autism from the 1990s. It summarizes that while 1988 studies found no firm conclusions on AAC efficacy for autism, research since has demonstrated potential benefits. Studies showed symbols can help communication if they are aided, visual-graphic, iconic, and intelligible to partners. AAC does not inhibit speech development and may support literacy skills, though more research is still needed. The chapter outlines considerations for clinical practice and directions for future research.
English speaking proficiency assessment using speech and electroencephalograp...IJECEIAES
In this paper, the English speaking proficiency level of non-native English speakers was automatically estimated as high, medium, or low performance. For this purpose, the speech of 142 non-native English speakers was recorded and electroencephalography (EEG) signals of 58 of them were recorded while speaking in English. Two systems were proposed for estimating the English proficiency level of the speaker; one used 72 audio features, extracted from speech signals, and the other used 112 features extracted from EEG signals. Multi-class support vector machines (SVM) was used for training and testing both systems using a cross-validation strategy. The speech-based system outperformed the EEG system with 68% accuracy on 60 testing audio recordings, compared with 56% accuracy on 30 testing EEG recordings.
Upgrading the Performance of Speech Emotion Recognition at the Segmental Level IOSR Journals
This document presents research on improving the accuracy of automatic speech emotion recognition using minimal inputs and features. The researchers used only the vowel formants from English speech recordings of 10 female speakers producing neutral and 6 basic emotions. They analyzed the vowel formants using statistical analysis and 3 classifiers to identify the best performing formants. An artificial neural network using the selected formant values achieved 95.6% accuracy in classifying emotions, higher than previous studies. The approach requires fewer features and less complex processing while achieving good recognition rates.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Azure API Management to expose backend services securely
A DEEP LEARNING BASED EVALUATION OF ARTICULATION DISORDER AND LEARNING ASSISTIVE SYSTEM FOR AUTISTIC CHILDREN
1. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
DOI: 10.5121/ijnlc.2017.6502 19
A DEEP LEARNING BASED EVALUATION OF
ARTICULATION DISORDER AND LEARNING
ASSISTIVE SYSTEM FOR AUTISTIC CHILDREN
Leena G Pillai and Elizabeth Sherly
Research Associate, IIITM-K, Trivandrum, India
Professor, IIITM-K, Trivandrum, India
ABSTRACT
Automatic Speech Recognition (ASR) is a thriving technology, which permits an application to detect the
spoken words or the speaker using computer. This work isfocused on speech recognition for children with
Asperger’s Syndrome who belongs to the age group of five to ten. Asperger’s Syndrome (AS) is one of the
spectrum disorders in Autism Spectrum Disorder (ASD) in which most of affected individuals have high
levelof IQ and sound language skills. But thespectrum disorder restricts for communication and social
interaction. The implementation of this workaccomplished through two different stages. In the first stage,
an attempt is made to develop an Automatic Speech Recognition system, to identify the articulation
disorder inchildren with Aspergers’ Syndrome (AS) on the basis of 14 Malayalam vowels. A deep auto-
encoder is used for pre-training in an unsupervised manner with 27,300 features that includes Mel-
Frequency Cepstral Coefficients (MFCC), Zero Crossing, Formants 1, Formants 2 and Formants 3. An
additional layer is added and fine-tuned to predict the target vowel sounds, which enhance the performance
of the auto–encoder. In the second stage, Learning Assistive System for Autistic Child (LASAC)is
envisioned to provide speech training as well as impaired speech analysis for Malayalam vowels. A feature
vector is used to create the acoustic model and the Euclidean Distance formula is used to analyze the
percentage of impairment in AS student’s speech by comparing their speech features against the acoustic
model.LASAC provides an interactive interface that offers speech training, speech analysis and articulation
tutorial. Most of the autistic students have difficulty to memorize words or letters but they are able to
recollect pictures. This serves as a learning assistive system for Malayalam letters with their corresponding
images and words. The DNN with auto-encoder has achieved an average accuracy of 65% with the
impaired (autistic) dataset.
KEYWORDS
Automatic Speech Recognition, MFCC, Zero crossing, Formants analysis, Autism Spectrum Disorder,
Deep Auto Encoder, Euclidean Distance.
1. INTRODUCTION
The speech communication is a speech chain communication that consists of speech production
mechanism in the speaker, transmission through a medium and speech perception process in the
ear and brain of the listener.The Automatic Speech Recognition (ASR) allows users to
communicate with the computer interface,by using their voice, in a way that resembles normal
human conversation [1]. In ASR, algorithm plays the listener’s role by decoding speech and the
speech synthesis performs the speaker’s role.This paper is focused on ASR that identifies the
impaired vowel voice articulated by the children with Aspergers’ Syndrome.Sounds are the
outcome of the articulation process, where the air coming from the vocal tract is adjusted by lips,
tongue, jaw, teeth, and palate. An articulation problem leads a person to produce sounds
incorrectly. Most of the autistic children deliver impaired speech. Children with articulation
disorder may delete, substitute, add or distort sounds. It is observed that students get trained with
sounds in their early age can reduce their articulation problem, language and communication
2. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
20
disorder [2].The parts of brain, associated together in the process of speech are Frontal Lobe,
Parietal Lobe, Occipital Lobe and temporal Lobe. While the human being communicating, these
four lobes are coupled together. In autistic children, lack of information sharing occurs in
between the brain lobes. This is one of the situations that lead to communication disorder among
the autistic children.
Aspergers’ Syndrome (AS) is one of the Autism Spectrum Disorders (ASD), which is
characterized by a high functioning form of autism with serious deficiencies in social and
communication skills [6]. The IQ levels of these students vary from typically normal to very
superior range [7]. Usually Asperger syndrome affected individuals have sound language skills,
but they may struggle to realize the anticipations of others within conversations, or follow
echolalia (repeating what the other person has just said). AS cannot be cured, but the syndrome
can be reduced rehabilitative services. Language and speech therapy is one of the rehabilitative
services. In this present work LASAC envisioned to provide speech training in Malayalam
language as well as impaired speech and articulation analysis with the AS affected children who
belong to the age group of five to ten.
This work initially concentrated on developing speech recognition for such impaired ASD
children. The study was performed on articulation and voice disorder to identify the suitable
feature extraction and prepared a training dataset for impaired speech recognition. The feature
vector selected for network training is an accelerating factor of classification accuracy. Frequency
describes the fundamental characteristics of speech signals, therefore frequency based feature
extraction methods Mel-Frequency Cepstral Coefficient (MFCC) and Spectrogram analysis are
used for frequency feature extraction [11,12,13]. The spectrogram analysis identifies Formants
values that correlated with different articulation position. As most of the vowels are articulated by
unrestricted airflow, a time based feature extraction method zero crossing used to measure the
zero crossing rates in the signal. The feature vector consist of 27,300 features that extracted
usingfrequency based feature extraction methods (MFCC and Spectrogram analysis)and time
based feature extraction method (Zero Crossing).
To evaluate the articulation and voice disorder in children with AS, an articulatory phonetics
conducted by applying unsupervised Deep Neural Network with auto-encoder. Hinton et al., (2006)
initially proposed a scheme with greedy, unsupervised and layer-wise pre-training algorithm in
Deep Belief Network (DBN) where each layer modeled by a Restricted Boltzmann Machine (RBM)
[2]. Later works proved that architectures like auto-encoders [3] or conventional neural networks [4]
with similar scheme are suitable for building Deep Networks. DNN is a type of ANN that contains
multiple hidden layers between the input and output layers [5]. Complex non-linear relationships
can be modeled by DNN. In this work, an auto-encoder is used for pre-training in an unsupervised
manner, not to produce classification at output instead it reproduces input at output layer. Auto-
encoder helps to identify relevant features by pre-training and setting the target values to be equal to
the input.
In the second part, an assistive learning system called LASAC is introducedthat serves an
interactive Malayalam language learning assistive system for children with AS.Several comparative
studies between traditional teaching methods and digital teaching methods have found that
computer assisted programs were more effective in improving language as well as cognitive
outcome measures of autistic children [8,9]. These studies also reveal that with the help of computer
assisted and game like technologies, such children shows greater improvements in motivation and
attentiveness [10]. Following these facts, a Malayalam language Learning Assistive System for
Autistic Child (LASAC) is developed for limited words. It offers an interactive interface with words
and corresponding images for each vowel, articulation tutorial, voice recorder and voice analysis
tool. The Euclidean Distance formula is used to analyse the impairment percentage of the vowel
articulated by the user.
3. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
21
1.1 Malayalam Vowels
The modern Malayalam alphabets consist of 15 vowel letters. These vowel sounds are classified
into Monophthongs that shows one vowel quality, for example (a) and Diphthongs that show
two vowel quality, for example (au).Malayalam is one of the richest languages in terms of
number of alphabets and the toughest language while considering speech. Vowel sounds are
articulated due to unrestricted airflow.Therefore, these phonemes have highest intensity and
duration lies in between 60 to 400ms in normal speech. The vowel sound signals are quasi-
periodic due to repeated excitation of the vocal tract.
The Malayalam vowels are classified based on tongue height, tongue backness, lip rounding
tenseness of the articulators [3].
1.1.1 Tongue Height
Vowels are classified into high, low and mid based on the space between the tongue and the roof
of the mouth while producing sound. The high vowelsഇ (i), ഈ (i:), ഉ (u)andഊ (u:) requires
relatively narrow space between the tongue and the roof of the mouth. The low vowelsഅ
(a)andആ(a:) requires relatively wide space between the tongue and the roof of the mouth. The
mid vowelsഎ(e), ഏ(e:), ഒ(o) andഓ(o:) requires a space between the low and high.
Figure 1. Tongue height position
1.1.2 Tongue Backness
Vowels are classified into three- Front, Back and Central- based on how far the tongue is
positioned from the back of the mouth. The front vowelsഇ(i), ഈ(i:), എ(e) andഏ(e:)are
articulated by placing the tongue forward in the mouth. The back vowelsഉ(u), ഊ(u:), ഒ(o)
andഓ(o:)are articulated by placing the tongue far back in the mouth. The central vowelsഅ (a),
ആ (a:)are articulated by placing the tongue roughly in between the front and the back of the
mouth.
4. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
22
Figure 2. Tongue backness position
1.1.3 Lip Rounding
Another aspect of vowel classification is based on the presence or absence of lip rounding while
producing a sound. Vowels -ഉ(u), ഊ(u:), ഒ(o), ഓ(o:) andഔ (au) are produced with a high
degree of lip rounding.
Front Central Back
High
Short ഇഇഇഇi ഉഉഉഉu
Long ഈഈഈഈi: ഊഊഊഊu:
Mid
Short എഎഎഎe ഒഒഒഒo
Long ഏഏഏഏe: ഓഓഓഓo:
Low
Short അഅഅഅa
Long ആആആആa:
Table 1.Malayalam Vowel Classification.
2. LITERATURE REVIEW
Manasi Dixit and ShailaApte (2013) conducted a study on speech evaluation with children
suffering from Apraxia of Speech (AOS). The work is based on the study of children in the age
group of three to eight with Marathi mother tongue [15]. The speech disability of AOS students,
estimated based on the result retrieved from Speech Pause Index, Jitter, Skew and Kurtosis
analysis.The Speech-Pause Index (SPI) that indicates Segmental and Supra-segmental analysis
was remarked to be high in pathological subjects. The skew factor identified below the threshold
level of 3.4 and the Kurtosis factor identified below 12 in almost all pathological subjects. This
work observed that formants are widely spread in case of pathological subjects.
Nidhyananthanet al., (2014) conducted an experiment on contemporary speech or speaker
recognition with speech from impaired vocal apparatus [16]. This work performed a comparative
study between the MFCC, ZCPA and LPC feature extraction algorithm and modelling methods
GMM, HMM, GFM and ANN. These feature extraction algorithms and modelling methods
applied on speech recognition of normal speakers and persons with speech disorders. The MFCC,
ZCPA and LPC feature extraction techniques are compared and analysed. Similarly GMM,
HMM, GFM, ANN and their fusions are analysed and evaluated with their identification
accuracy. This work concluded that ASR system gives high performance with normal speakers
but in case of persons with speech disorders it gives low performance. This work suggests that in
order to improve the accuracy, multi nets ANN can be employed. MFCC feature extraction has
acquired the highest accuracy rate of 85%, whereas ZCPA has acquired 38% and LPC has
achieved 82%. Ahmed et al., (2012)conducted an ASR experiment in Malay language with
MFCC features. This work has obtained an average accuracy of 95% using Multilayer Perceptron
[17].
Kyung-Im Han, Hye-Jung Park and Kun-Min Lee (2016) conducted a study on hearing impaired
people who mostly depends on visual aids [18]. SVM technique used to extract phonetic features
5. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
23
of the sound and lip shapes was examined for each English vowel. This work allows the language
learners to see a visual aid that provides guidelines regarding the exact lip shape, while producing
a vowel. F1 and F2 formants frequency analysis enhance for the identification of lip shape. This
paper implemented the lip shape refinement for five English vowels. The speakers involved in
this work are English native speakers, Korean normal-hearing speakers and Korean hearing
impaired speakers. In this work, it is observed that lip shape adjustment is very important to
acquire native-like sounds.
Anu V Anand, P Sobhana Devi, Jose Stephen and Bhadran V K (2012) were applied Malayalam
Speech Recognition System for visually challenged people [19]. MFCC method used for feature
extraction and Hidden Markov Model (HMM) used to build Acoustic model. For this study,
samples collected from 80 speakers. In this work, ASR with Speech-To-Text (STT) in openoffice
writer converts the visually challenged individual’s speech to text and Text-To-Speech (TTS),
reads the executed commands and documents. This system works in two mode dictation mode
and command mode. It offers user friendly interface for visually challenged people for easy
document preparation.
Thasleema et al., (2007) have conducted a K-NN pattern classifier based on Malayalam vowel
speech recognition. Linear Predictive Coding (LPC) method is used for Malayalam vowel feature
extraction[20]. The result compared with wavelet packet decomposition method. The k-NN
pattern classifier used for Recognition. The study conducted on five Malayalam vowels അ (a),
ഇ (i), എ(e), ഒ (o) and ഉ (u) only. The overall recognition accuracy obtained in five Malayalam
vowels using wavelet packet decomposition method is 74% whereas the recognition accuracy
obtained by using LPC method is 94%.
Several studies with ASD have been proved that the students with autism spectrum disorder
responds well to treatments or therapies that involve visual support such as pictures [21], videos
[22] and computers [23]. The picture exchange communication system (PECS) is an
augmentative communicationsystem frequently used with children with autism.The introduction
of the video modelling intervention led to an increase in both verbal and motor play responses in
the children with autism.
3. METHODOLOGY
The present work is conducted with articulatory phonetics on 14 Malayalam vowels to identify
the voice production impairment of Autism affected children as well as a learning assistive
system for corrective measurement.
For pre-processing, the spectral noise gate technique is applied for noise reduction using
audacity.The feature extraction methods that are employed in this work are MFCC, Zero-crossing
and spectrogram analysis. Thenetwork is trained using Deep Neural Network with auto-encoder
for the evaluation of articulation disorder. In the learning assistive interface, Euclidean Distance
is used to analyse the level of impairment in autistic speech.
3.1 Mel-Frequency Cepstral Coefficients (MFCC)
The articulated phoneme can be predicted accurately by identifying the shape of the vocal tract
because the shape of the vocal tract determines the variations in sound wave. This shape is
enveloped with in the short term power spectrum. The MFCC feature extraction algorithm extract
features from the short term power spectrum. Therefore, Mel-Frequency Cepstral Coefficient
(MFCC) considered as a most commonly used faster and prevailing feature extraction method in
ASR.The MFCC comprises of eight steps that depicted in the Figure3.The pre-emphasis stage is
employed to boost the amount of energy in the high frequencies so that the acoustic model can be
6. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
24
formed with quality information. The signals are enclosed in to 30ms with an overlap of 20ms
frames. The short frames are considered with the assumption that the audio signals do not change
much in short time scale.
Figure 3. MFCC Workflow
The power spectrum calculation of each frame is inspired by cochlea (an organ in the ear) that
vibrates at different spots depending on the frequency of the incoming sounds. Depending on this
vibration nerves fires information to the brain about the presence of frequencies. The MFCC also
identifies the frequencies that present in the frame. The Filter bank concept is also adopted from
cochlea, which identifiesenergythat exists in various frequency regions.
3.2 ZERO – CROSSING RATE
Zero crossing is defined by measuring the number of times the amplitude of the speech signal
crosses a value of zero within a given window [25]. According to the presence or absence of the
vibration in the vocal cord, sounds can be classified in to voiced or unvoiced. Voiced speech
produced as a result of the vibration of the closed vocal tract by the periodic air pressure at the
glottis and this segment shows a low zero crossing count. Unvoiced speech produced due to the
free air flow in the open vocal tract and shows a high zero crossing rates (Figure 4). The vowel
sounds are produced as a result of open and unrestricted air flow from the vocal tract.
Figure 4. Zero crossing rate in unvoiced and voiced
3.3 SPECTROGRAM ANALYSIS
A visual representation that displays the spectral data over time with the amplitude of the
frequency components denoted by different colours or usually different shades of gray is called
Spectrogram. In spectrogram the horizontal axis represents the duration of sound in second or
7. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
25
millisecond. The vertical axis shows the frequency values. The most important formants
parameter can be identified by the means of degree of darkness in the spectrogram [13].
Vowels are fully developed formants pattern that can be classified on the basis of the F1 and F2
analysis. These resonance frequencies associated with the two cavities separated by the
constriction in the vocal tract. F1 is the lowest frequency associated with tongue height that
affects space of the pharynx. F2 is the second lowest frequency associated with the changes
within the oral cavity correlated with lip, jaw or tongue retraction.
For example the vowel ഇ (i) is produced from the front constriction with large pharyngeal cavity
and small oral cavity. Therefore ഇ (i)can be characterized by low F1 around 400Hz and high F2
around 2500Hz. The vowel അ (a) is produced from the back constriction with much smaller
pharyngeal cavity and large oral cavity. Therefore അ(a)can be characterized by slightly high F1
around 600Hz and much low F2 around 1000Hz (Figure 5). F1 and F2 values are enough for
vowel classification, but for improving accuracy and reliability F3 value is also considered. The
F3 value analysis helps to make a differentiation between rounded and unrounded vowels.
Figure 5: Spectrogram representation of (a), (i) and (u)
3.5 DEEP NEURAL NETWORK
DNN is a class of ANN that composed of multiple layers with completely connected weights and
initialized by applying unsupervised or supervised pre-training techniques for solving
classification problems in complex data [5]. The nodes in each layer trained with distinct level of
feature abstraction or feature hierarchy, which enables a DNN to solve very large, high-
dimensional dataset with large set of parameters [3].
In this work, the unsupervised DNNs are effective method to discover latent structures and
patterns with in unlabelled and unstructured. Each node in a DNN learns structures automatically
from the unlabelled data by reconstructing the size and dimension of the input. Therefore, the
network generates correlations between convinced features and optimal results. The individual
layer-wise training can be achieved by a special type of nonlinear method called auto-encoder
without the class labels [4]. Deep-learning ends in a labelled softmaxoutput layer that joins all the
hidden layers together to form a deep network and train in a supervised fashion.
8. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
26
4. IMPLEMENTATION
4.2 DATA COLLECTION AND PRE-PROCESSING
The training data set have been collected from 30 normal students, 13 boys and 17 girls. For
testing, vowel sounds are recorded from 10 students, 3 boys and 7 girls, with AS. The ZoomH1
recorder is used for this recording purpose. The records are collected in .wav format in 48 KHz
sampling rate with 24-bit resolution.
The Undesired frequencies that appear in the speech signal that degrades the quality of signalsare
considered as noise. The spectral noise gate technique is applied for noise reduction using audacity.
The decibel passed to Noise Reductionis 12 dB to reduce the noise to an acceptable level. The
Frequency smoothing set to 3Hz for speech smoothening.
4.3 FEATURE EXTRACTION
The three algorithmicfeature extraction Procedures that are implemented in this paper are
described in Table 2.
Procedure 1
Input : Speech Signals
Output : 60 MFCC features
1. Boost the amount of energy in the high frequencies (Pre-emphasis).
2. Signal is divided in to 30ms frames with an overlap of 20ms& and apply spectral
analysis.
3. Each frame multiplied by window function y[n] = x[n] * w[n].
4. Apply FFT to each frame to convert the samples from time domain tofrequency domain.
= −2 + . −2
5. A group of triangle band pass filters that simulates the characteristicsof the human's ear
are applied to the spectrum.
6. Compute the logarithm of square magnitude of the output of Mel-filter bank.
= 2595 log 10 #1 + $
%
100
&'
7. DCT convert the log Mel spectrum in to time domain using Discrete Cosine.
( =
1
− 1 ) 2 + . 2
*+
8. Return 60 MFCC values.
Procedure 2:
Input : Speech Signals
Output : Zero Crossing Rate and Standard Deviation
1. Set Sampling_rate=48000
2. Calculate the length of the window
Window_length = Frame_size X Sampling_rate
3. Calculate the number of frames
9. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
27
Frame_num = ( length(wave_Aile) – Window_length / Sampling_rate) + 1
4. In each frame
Calculate Zero Crossing Rate
EFG =
1
2
| I ()[ ]) − I ()[ − 1]|
(*
Calculate Zero Crossing Standard deviation
LM = N
1
(EFGO − P)Q
O*
5. Return ZCR and std
Procedure 3:
Input : Speech Signals, TextGrid
Output : Formants 1, Formants 1, Formants 1
1. Set Interval=3
2. Open each sound file and corresponding TextGrid
Get the label of intervals with number of tiers and number of intervals
Label = Get label of interval : tiernum, intnum
Calculate the Midpoint
Beg = Get Start point : tiernum, intnum
End = Get End point : tiernum, intnum
Midpoint = Beg + ((End – Beg)/2)
Calculate formants values at each interval
F1 = Get peak frequency at time 1 midpoint hertz
F2 = Get peak frequency at time 2 midpoint hertz
F3 = Get peak frequency at time 3 midpoint hertz
3. Return F1, F2 and F3 from each file
Table 2: Algorithmic procedures for feature extraction
The feature extraction algorithms created a feature vector from the processed signals. The
features that are extracted from a single vowelare listed in Table 3.
Feature Extraction Number of features
MFCC 60
Zero Crossing 2
Spectrogram Analysis 3
Total 65
Table 3: Features extracted from a single vowel
To create the feature vector records collected from 30. 14 vowels are sampled from each dataset (30
X 14 = 420) and 65 features retrieved from each sample. Therefore, the feature vector consist of
27,300 (420 X 65) features.
10. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
28
4.3 AUTO – ENCODER
An auto-encoder is a greedy layer-wise approach for pre-training that consists of an encoder and a
decoder. The encoder is responsible to map the input to a hidden representation and the decoder
map this representation back to the original input (Figure 7).
Figure 7: Deep Auto – encoder Architecture
An auto-encoder consists of multiple layers of sparse encoders. Each layer outputs are wired to
the input of successive layer. Consider an auto-encoder with n number of layers. W(1)
, W(2)
, b(1)
,
b(2)
are the parameters for kth
auto-encoder.
Then the encoding step is given by forward order:-
R(S)
= %TU(S)
V (1)
U(SW )
= X(S. )
R(S)
+ Y(S, )
(2)
The decoding step is given by reverse order:-
R((WS)
= %(U((WS)
) (3)
U((WSW )
= X(( S,Q)
R((WS)
+ Y(( S,Q)
(4)
where the information of interest is stored in a(n)
.W denotes the weight matrix and b represents
bias vector. The features retrieved from the auto-encoder can be used for classification problems
by passing a (n)
to a softmax classifier.In the first layer, regularizersare applied to learn the sparse
representation. The control parameters given to the regularizers are listed in Table 4.
Hidden size
Auto-encoder 1 65
Auto-encoder 2 60
L2WeightRegularization 0.002
SparsityRegularization 4
SparsityProportion 0.02
DecoderTransferFunction Purelin
Table 4. Auto-encoder parameters
11. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
Two auto-encoders are implemented in this work. 65 coefficients (features extracted from each
sound samples) are given to the first auto
(equal to the number of features),
dimensions. The extracted features from the first encoder are given as an input to the se
encoder. The number of neurons in
output with 60 X 1 dimensions (Figu
features for classification.
Figure 8. Implementation architecture of Auto
Finally train the softmax layer in a supervised fashion to transform all the net activations to a
series of values that can be interpreted as probabilities. The softmax layer trained with dataset and
the target output. The softmax layer applies softmax function or norm
input and classifies to 14 classes.
5. RESULT AND DISCUSSION
The experiments are conductedto
with Asperger’s Syndromewithin the age limit of five to
focused on 14 Malayalam vowels
normal audio is included to identify the difference in AS children articulation.
measured using true positive, true negative, fals
[ ]R ^
Correctly predicted data = TP+TN
Total test data = TP+TN+FP+FN
TP - True Positive, TN -
5.1 AUTISTIC ARTICULATORY
Articulatory phonetics conducted to evaluate the articulation disorder in
would help the learning assistive system to ascertain the ar
children.The Network trained with unsupervised Deep Auto
NETWORK TRAINING
From the 14 Malayalam vowel sou
normal (30 speakers) students. 65 features extracted from each sample.
The autistic speech datasets are able to achieve only 54% of classification accuracy whereas
normal student’s speech dataset obtained 98%. Each autistic child faces unique articulation and
voice impairment. Therefore, identifying a classification structure from their feature vector is
International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
encoders are implemented in this work. 65 coefficients (features extracted from each
sound samples) are given to the first auto-encoder. The number of neurons in hidden layer
(equal to the number of features), therefore the first auto-encoder delivers output of 65 X 1
dimensions. The extracted features from the first encoder are given as an input to the se
encoder. The number of neurons in second hidden layer is set to 60 which results a compressed
output with 60 X 1 dimensions (Figure 8). These features are considered as the bottle neck
. Implementation architecture of Auto-encoder
Finally train the softmax layer in a supervised fashion to transform all the net activations to a
values that can be interpreted as probabilities. The softmax layer trained with dataset and
the target output. The softmax layer applies softmax function or normalized exponential to the
input and classifies to 14 classes.
ISCUSSION
to study on impaired speech and articulation disorder in
within the age limit of five to ten. As an initial work, this study is
Malayalam vowels for 50 children, which includes normal and AS affected
normal audio is included to identify the difference in AS children articulation. The accuracy is
measured using true positive, true negative, false positive and false negative rates.
=
# ]]` La^ b]`M L`M
c LRa L` LMRLR
100
Correctly predicted data = TP+TN
Total test data = TP+TN+FP+FN
- True Negative, FP - False Positive, FN - False negative.
RTICULATORY PHONETICS
phonetics conducted to evaluate the articulation disorder in children with AS
would help the learning assistive system to ascertain the articulation disorder faced by the autistic
The Network trained with unsupervised Deep Auto-encoder.
Malayalam vowel sounds collected from both AS affected students (20 s
) students. 65 features extracted from each sample.
The autistic speech datasets are able to achieve only 54% of classification accuracy whereas
normal student’s speech dataset obtained 98%. Each autistic child faces unique articulation and
ent. Therefore, identifying a classification structure from their feature vector is
International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
29
encoders are implemented in this work. 65 coefficients (features extracted from each
neurons in hidden layer is 65
r delivers output of 65 X 1
dimensions. The extracted features from the first encoder are given as an input to the second auto-
hidden layer is set to 60 which results a compressed
). These features are considered as the bottle neck
Finally train the softmax layer in a supervised fashion to transform all the net activations to a
values that can be interpreted as probabilities. The softmax layer trained with dataset and
alized exponential to the
and articulation disorder in children
As an initial work, this study is
for 50 children, which includes normal and AS affected. The
The accuracy is
(5)
False negative.
children with AS.This
ticulation disorder faced by the autistic
(20 speakers) and
The autistic speech datasets are able to achieve only 54% of classification accuracy whereas
normal student’s speech dataset obtained 98%. Each autistic child faces unique articulation and
ent. Therefore, identifying a classification structure from their feature vector is
12. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
30
difficult. The experiment reveals that records collected from normal students are preferable to
train the network.
NETWORK TESTING
The Network is initially trained with an unsupervised Deep Auto-encoder. The DNN with auto-
encoder is able to achieve 56% of test accuracy with the impaired (autistic) dataset. Each class
obtained variant accuracy ranging from 30% to 90%. This variation is based on articulation
requirement of each vowel. Table 5 describes the detailed evaluation result based on each class and
the accuracy obtained from unsupervised DNN.
CLASS LETTER DNNACCURACY %
1 അ(a) 90
2 ആ(a:) 80
3 ഇ(i) 70
4 ഈ(i:) 70
5 ഉ(u) 50
6 ഊ(u:) 50
7 ഋ(rr) 30
8 എ(e) 70
9 ഏ(e:) 50
10 ഐ(ai) 40
11 ഒ(o) 50
12 ഓ(o:) 40
13 ഔ(au) 30
14 അം(am) 70
Total 56%
Table 5: Test result of each vowel articulated by autistic children
The performance evaluation table (Table 5) and graph (Figure 9) explicates that the children with
AS struggle to produce tongue back and high or middle monophthong vowels - ഉ(u), ഊ(u:), ഒ
(o) andഓ (o:). All the primary letters such as അ (a), ഇ (i), ഉ (u),എ (e) and ഒ(o)are able to
achieve more accuracy than its extended letters such as ആ(a:), ഈ(i:), ഊ(u:), ഏ(e:) and ഓ(o:). Most of the autistic children are not able to identify the difference between primary and its
extended letters. They articulate the same manner for both sounds.Misclassification occurs
regularly in between the tongue backness featured and lip rounded vowels ഉ(u), ഊ(u:), ഒ(o)
and ഓ(o:).
13. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
31
Figure 9: Performance evaluation graph of each vowel articulated by autistic children
The vowels ഐ(ai) and ഔ(au) are diphthong vowels. A diphthong vowel shows two vowel
qualities. The vowel ഐ (ai) is the combination of two vowel sounds അ (a) and ഇ (i).The
vowel ഔ (au)is the combination of two vowel sounds അ (a) and ഉ (u). Diphthongs vowels
acquired least accuracy as they misclassified with any of their combination vowels. Most of the
AS affected students are not able to produce both sounds together. The vowel sound ഋ(rr) is the
combination of ‘eee-rrr’. The ഋ (rr) can be produced by restricted airflow that cause the vocal
cord to vibrate. To shape this sound the tip of the tongue placed in between the hard and soft
palate. Therefore ഋ(rr) also attained least accuracy. Most of the autistic students are not able to
articulate these (ഐ(ai), ഔ (au),ഋ (rr) and ഓ (o:)) letters properly. Therefore, as far as the
DNN network is considered, without these vowels thesystem performance shows 65% of
accuracy(Table 6 and Figure 10 ).
DATASET ACCURACY
Students with AS 65%
Normal students 98%
Table 6.Performance of the Classifier
0
10
20
30
40
50
60
70
80
90
100
(a)
(a:)
(i)
(i:)
(u)
(u:)
(rr)
(e)
(e:)
(ai)
ഒ(o)
(o:)
(au)
(am)
DNN ACCURACY %
DNN ACCURACY %
14. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
Figure 10.Autistic Vs Normal Speech Accuracy Graph
5.2 LEARNING ASSISTIVE SYSTEM
The LASAC provides speech training through an interactive interface, by considering the
common interest of the autistic chil
words and corresponding images for each vowel,
recording and analysis tool.The DNN classifier is used to provide assistance to the Speech
Language Pathologist (SPL) in impaired speech recognition. The DNN classifies the autistic
speech input in to the most suitable class according to their observed similarities.
classification, the Euclidean Distance formula
autistic voice input. The result obtained from Euclidean Distance stored in database for further
evaluation and tracking the improvement of the student speech after tr
LASAC is depicted in Figure 11.
0%
50%
100%
Students with AS
International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
Figure 10.Autistic Vs Normal Speech Accuracy Graph
YSTEM FOR AUTISTIC CHILDREN (LASAC)
The LASAC provides speech training through an interactive interface, by considering the
common interest of the autistic children. The interface consists of Malayalam virtual keyboard,
words and corresponding images for each vowel, Articulation tutorial for each vowel, voice
The DNN classifier is used to provide assistance to the Speech
Language Pathologist (SPL) in impaired speech recognition. The DNN classifies the autistic
speech input in to the most suitable class according to their observed similarities.
Euclidean Distance formula is used to analyse the impairment percentage
The result obtained from Euclidean Distance stored in database for further
evaluation and tracking the improvement of the student speech after training. The architecture
.
Figure 11. LASAC architecture
Students with AS Normal students
ACCURACY
ACCURACY
International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
32
The LASAC provides speech training through an interactive interface, by considering the
dren. The interface consists of Malayalam virtual keyboard,
Articulation tutorial for each vowel, voice
The DNN classifier is used to provide assistance to the Speech
Language Pathologist (SPL) in impaired speech recognition. The DNN classifies the autistic
speech input in to the most suitable class according to their observed similarities. Apart from
percentage in
The result obtained from Euclidean Distance stored in database for further
The architecture of
15. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
33
The straight-line distance between two points in Euclidean space is called Euclidean distance.
The Euclidean Distance is the most common formula to calculate the difference between two
points [26]. It calculates the root square difference between coordinates of a pair of objects. The
distance formula is derived from the Pythagorean Theorem. The Euclidean Distance formula is :
d LR ` = e()Q − ) )Q + (^Q − ^ )Q (6)
This work provides an interactive learning assistive system with speech training for the autistic
children. To enhance the speech training, Euclidean Distance formula is implemented to calculate
the difference between normal speech dataset and autistic speech test dataset.Figure 12 shows the
GUI result obtained after Euclidean calculation.
Figure 12: GUI result of Euclidean Distance
The result describes that the given voice sample is 25% different from the feature vector labelled
ആ(a:), which means the sample is 75% similar to the normal ആ(a:) and the DNN classifier
classified the input voice to അ(a). The result is recorded in database for future analysis.
Phoneme Similarity Remark
അ(a) 90% Good
ആ(a:) 75% Average
ഇ(i) 80% Average
ഈ(i:) 72% Average
ഉ(u) 66% Below Average
ഊ(u:) 62% Poor
ഋ(rr) 51% Poor
എ(e) 83% Average
ഏ(e:) 76% Average
ഐ(ai) 60% Poor
ഒ(o) 64% Below Average
ഓ(o:) 59% Poor
ഔ(au) 53% Poor
അം(am) 85% Good
Table 7. Similarity list achieved from an autistic student’s speech training
16. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
34
Table 7 shows the Similarity list achieved from speech training, provided to an autistic student
and the corresponding remarks. This result helps to analyze the basic speech impairment
percentage level of the AS students. Guardians or teachers can focus on those vowels for further
training which shows least similarity.
6. CONCLUSION
This work is coupled with evaluation of articulation disorder in AS students and Malayalam
Vowel training as well as impaired voice analysis. The DNN with auto-encoder, an unsupervised
machine learning technology, used to analyse the articulation disorder in the AS students in the
age group of five to ten. Two auto-encoders are used with 65 and 60 hidden layer neurons
respectively, which enhanced to achieve improved result with bottle neck feature extraction.This
work is based on 14 Malayalam vowels.Though the network trained with 27,300 features has
shown an average classification accuracy of 98% for normal children, the Autism children’s
accuracy obtained is only 65%.
An interactive interface used to provide language and speech training. In speech training, the
DNN classifier used to assist the Speech Language Pathologist (SPL) to recognize the autistic
speech input and Euclidean Distance formula is used to enhance the speech training by evaluating
thepercentage of impairment. The feature vector with 65 features contains MFCC (60), Zero
Crossing Rate (2) and Spectrogram analysis (3).
The Centers for Disease Control and Prevention (CDC) reveals that 1 among the 68 children are
diagnosed with Autism Spectrum Disorder (ASD).In this scenario, the technologies required to
support the communication and language development of these children.In future, this work can
be extended to consonants identification along with more unique features. The maximum features
can be combined and with the help of bottle neck feature extraction it is able to identify optimum
features that leads to a better performance. As this work is social relevant, we hope that this work
will inspire others to conduct more research on this area that provide support to the
communication and language development of autistic children.
REFERENCES
[1] Atal, B., and L. Rabiner."A pattern recognition approach to voiced-unvoiced-silence classification
with applications to speech recognition."IEEE Transactions on Acoustics, Speech, and Signal
Processing 24, no. 3 (1976): 201-212.
[2] G.E. Hinton, S. Osindero, and Y.W. Teh, “A fast learning algorithm for deep belief nets,” Neural
computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[3] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep
networks,” Advances in neural information processing systems, vol. 19, pp. 153, 2007.
[4] M.A.Ranzato, F.J. Huang, Y.L. Boureau, and Y. LeCun. “Unsupervised learning of invariant feature
hierarchies with applications to object recognition.”Computer Vision and Pattern Recognition,
2007.CVPR’07.IEEE Conference on. IEEE,2007, pp. 1–8.
[5] Deng, Li, Geoffrey Hinton, and Brian Kingsbury. "New types of deep neural network learning for
speech recognition and related applications: An overview." In Acoustics, Speech and Signal
Processing (ICASSP), 2013 IEEE International Conference on, pp. 8599-8603. IEEE, 2013.
[6] Bowler, Dermot M. "“Theory of Mind” in Asperger's Syndrome Dermot M. Bowler." Journal of Child
Psychology and Psychiatry 33, no. 5 (1992): 877-893.
[7] Ozonoff, Sally, Sally J. Rogers, and Bruce F. Pennington. "Asperger's syndrome: Evidence of an
empirical distinction from high‐functioning autism." Journal of Child Psychology and Psychiatry 32,
no. 7 (1991): 1107-1122.
17. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
35
[8] Ozonoff, Sally. "Reliability and validity of the Wisconsin Card Sorting Test in studies of autism."
Neuropsychology 9, no. 4 (1995): 491.
[9] Whalen, Christina, Debbie Moss, Aaron B. Ilan, ManyaVaupel, Paul Fielding, Kevin Macdonald,
Shannon Cernich, and Jennifer Symon. "Efficacy of TeachTown: Basics computer-assisted
intervention for the intensive comprehensive autism program in Los Angeles unified school district."
Autism 14, no. 3 (2010): 179-197.
[10] Williams, Christine, Barry Wright, Gillian Callaghan, and Brian Coughlan. "Do children with autism
learn to read more readily by computer assisted instruction or traditional book methods? A pilot
study." Autism 6, no. 1 (2002): 71-91.
[11] Majeed, Sayf A., Hafizah Husain, Salina Abdul Samad, and Tariq F. Idbeaa. "Mel Frequency Cepstral
Coefficients (MFCC) Feature Extraction Enhancement in the Application of Speech Recognition: a
Comparison Study." Journal of Theoretical and Applied Information Technology 79, no. 1 (2015): 38.
[12] Hossan, MdAfzal, SheerazMemon, and Mark A. Gregory. "A novel approach for MFCC feature
extraction." In Signal Processing and Communication Systems (ICSPCS), 2010 4th International
Conference on, pp. 1-5. IEEE, 2010.
[13] Jiang, J. J., and Yu Zhang. "Nonlinear dynamic analysis of speech from pathological subjects."
Electronics Letters 38, no. 6 (2002): 294-295.
[14] Atal, B., and L. Rabiner. "A pattern recognition approach to voiced-unvoiced-silence classification
with applications to speech recognition." IEEE Transactions on Acoustics, Speech, and Signal
Processing 24, no. 3 (1976): 201-212.
[15] Dixit, Manasi, and ShailaApte. "Speech evaluation with special focus on children suffering from
apraxia of speech."Signal and Image Processing 4, no. 3 (2013): 57.
[16] Nidhyananthan, S. Selva, R. ShanthaSelvakumari, and V. Shenbagalakshmi. "Contemporary
speech/speaker recognition with speech from impaired vocal apparatus." In Communication and
Network Technologies (ICCNT), 2014 International Conference on, pp. 198-202. IEEE, 2014.
[17] Ahmed, Irfan, Nasir Ahmad, Hazrat Ali, and Gulzar Ahmad. "The development of isolated words
pashto automatic speech recognition system." In Automation and Computing (ICAC), 2012 18th
International Conference on, pp. 1-4. IEEE, 2012.
[18] Han, Kyung-Im, Hye-Jung Park, and Kun-Min Lee. "Speech recognition and lip shape feature
extraction for English vowel pronunciation of the hearing-impaired based on SVM technique." In Big
Data and Smart Computing (BigComp), 2016 International Conference on, pp. 293-296. IEEE, 2016.
[19] Anand, Anu V., P. Shobana Devi, Jose Stephen, and V. K. Bhadran. "Malayalam Speech Recognition
system and its application for visually impaired people." In India Conference (INDICON), 2012
Annual IEEE, pp. 619-624.IEEE, 2012.
[20] Thasleema, T. M., V. Kabeer, and N. K. Narayanan. "Malayalam vowel recognition based on linear
predictive coding parameters and k-nn algorithm." Conference on Computational Intelligence and
Multimedia Applications, 2007.International Conference on.Vol. 2.IEEE, 2007.
[21] Charlop‐Christy, Marjorie H., Michael Carpenter, Loc Le, Linda A. LeBlanc, and Kristen Kellet.
"Using the picture exchange communication system (PECS) with children with autism: Assessment of
PECS acquisition, speech, social‐communicative behavior, and problem behavior." Journal of applied
behavior analysis 35, no. 3 (2002): 213-231.
[22] D'Ateno, Patricia, Kathleen Mangiapanello, and Bridget A. Taylor. "Using video modeling to teach
complex play sequences to a preschooler with autism." Journal of Positive Behavior Interventions 5,
no. 1 (2003): 5-11.
[23] Hetzroni, Orit E., and JumanTannous. "Effects of a computer-based intervention program on the
communicative functions of children with autism." Journal of autism and developmental disorders 34,
no. 2 (2004): 95-113.
18. International Journal on Natural Language Computing (IJNLC) Vol. 6, No.5, October 2017
36
[24] Wong, Eddie, and SridhaSridharan. "Comparison of linear prediction cepstrum coefficients and mel-
frequency cepstrum coefficients for language identification."In Intelligent Multimedia, Video and
Speech Processing, 2001. Proceedings of 2001 International Symposium on, pp. 95-98. IEEE, 2001.
[25] Djurić, Milenko B., and Željko R. Djurišić. "Frequency measurement of distorted signals using
Fourier and zero crossing techniques." Electric Power Systems Research 78, no. 8 (2008): 1407-1415.
[26] Wang, Liwei, Yan Zhang, and JufuFeng. "On the Euclidean distance of images."IEEE transactions on
pattern analysis and machine intelligence 27, no. 8 (2005): 1334-1339.
AUTHORS
Leena G Piilai is currently working as Research Associate under Dr. Elizabeth Sherlyat
Indian Institute of Information Technology andmanagement-Kerala (IIITM-K), Technopark,
Trivanrum, Kerala. Has completed MPhilin Computer Science from Cochin University of
Science And Technology (CUSAT), Kerala. Herresearch area is Automatic Speech
Recognition and for the last one year she is focusing in speech analysis and recognition of
Autistic children.
Dr. Elizabeth Sherly is currently working as Professor ofIndian Institute of Information
Technology andmanagement-Kerala (IIITM-K), a Kerala Governmentorganization
inTechnopark, Trivanrum, Kerala. Havingmore than 25 years of experience in Education
andResearch in CS/IT, her major research work involves inPattern recognition and Data
mining using Soft computing.Ph.D in Computer Science from University of Kerala
inArtificial Neural Networks in Biological Control Systems,now actively pursuing research
in Medical ImageProcessing, Computational Linguistics, Automatic SpeechRecognition and Object
Modeling Technologies etc..