Speech recognition has been an active research topic for more than 50 years. Interacting with the
computer through speech is one of the active scientific research fields particularly for the disable
community who face variety of difficulties to use the computer. Such research in Automatic Speech
Recognition (ASR) is investigated for different languages because each language has its specific features.
Especially the need for ASR system in Tamil language has been increased widely in the last few years. In
this paper, a speech recognition system for individually spoken word in Tamil language using multilayer
feed forward network is presented. To implement the above system, initially the input signal is
preprocessed using four types of filters namely preemphasis, median, average and Butterworth bandstop
filter in order to remove the background noise and to enhance the signal. The performance of these filters
are measured based on MSE and PSNR values. The best filtered signal is taken as the input for the further
process of ASR system
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
This document provides a course report on speech recognition using MATLAB. It includes an abstract, introduction, report outline, design of the system, process and results, flowchart of the experiment, simulation and analysis, discussion, and conclusion sections. The report details using MFCC and GMM algorithms for speech recognition, including signal preprocessing, parameter extraction, and training. Simulation results showed recognition rates approaching 100% for same speakers in quiet environments, but performance was impacted by noise and different speakers. The system demonstrated speech recognition capabilities in MATLAB but had limitations.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
This document analyzes speech recognition performance based on neural network layers and transfer functions. It describes a system setup that uses MFCC features and ANNs for acoustic modeling. Experiments were conducted by varying the number of neural network layers (1, 2, 3 layers) and transfer functions (linear, sigmoid, tangent sigmoid). Results showed that networks with linear transfer functions in all layers achieved the performance goal, while other configurations reached minimum gradient but not the goal. The best architecture was a 3-layer network with linear transfer functions.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
This document presents a text-dependent speaker recognition system using neural networks that aims to improve recognition accuracy. It proposes changing the number of Mel Frequency Cepstral Coefficients (MFCCs) used in training. Voice Activity Detection is also used as a preprocessing step. Experimental results show recognition accuracy increases from 70.41% to a maximum of 92.91% as the number of MFCCs increases from 14 to 20, but then decreases with more MFCCs. The system is implemented on a Raspberry Pi for hardware acceleration.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
Presence of noise increases the dimension of the information. A noise suppression algorithm is developed
with an idea of combining the General Kalman Filter and Estimate Maximization (EM) frame work.This
combination is helpful and effective in identifying noise characteristics of an acoustic environment.
Recursion between Estimate step and Maximization step enabled the algorithm to deal any model of noise.
The same Speech enhancement procedure in applied in the pre-processing stage of a conventional Speaker
identification method. Due to the non-stationary nature of noise and speech adaptive algorithms are
required. Algorithm is first applied for Speech enhancement problem and then extended to using it in the
pre-processing step of the Speaker identification. The present work is compared in terms of significant
metrics with existing and popular algorithms and results show that the developed algorithm is dominant
over them.
This document describes a speaker independent Malayalam word recognition system using support vector machines. It extracts mel frequency cepstral coefficients (MFCCs) as features and compares the performance of linear and radial basis function kernels in the SVM classifier. An overall accuracy of 91.8% was achieved on a database containing isolated word utterances from different speakers, demonstrating the effectiveness of the MFCC features and RBF kernel for speaker independent speech recognition.
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
This document discusses a method for speaker identification and verification using Mel Frequency Cepstral Coefficients (MFCC) and Support Vector Machines (SVM). MFCC is used to extract features from speech samples, representing the spectral properties of the voice. SVM is then used to match these features between speech samples to identify or verify speakers. The authors claim this method can identify speakers with up to 95% accuracy and is computationally efficient. It is proposed that MFCC and SVM outperform other techniques for speaker recognition tasks. The system is described as having applications in security, voice interfaces, and other voice-based systems.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses an approach to extract features from speech signals using Mel Frequency Cepstral Coefficients (MFCC). MFCC is a popular feature extraction technique that models human auditory perception. It involves preprocessing the speech signal, framing it, applying the discrete cosine transform to the mel scale filter bank energies, and optionally adding delta and acceleration features. MFCC extraction was tested on isolated word samples to obtain coefficients representing the short-term power spectrum. These coefficients can then be used for applications like speech recognition by matching patterns in the feature space.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
HMM Classifier for Human Activity RecognitionCSEIJJournal
The rapid improvement in technology causes more attention towards to Recognizing of human activities
from video. These new technological growth has made vision-based research much more interesting and
efficient than ever before. This paper present novel HMM (Hidden Markov Model) based approach for
Human activity recognition from video. There are different approaches of HMM to recognize action of
human from video. Like threshold and voting to automatically and effectively segment and recognize
complex activities, segment and recognize complex activities and for simple activities we use Elman
Network (EN) and two hybrids of Neural Network (NN) and HMM, i.e. HMM-NN and NN-HMM.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
More Related Content
Similar to Isolated Word Recognition System For Tamil Spoken Language Using Back Propagation Neural Network Based On Lpcc Features
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
This document provides a course report on speech recognition using MATLAB. It includes an abstract, introduction, report outline, design of the system, process and results, flowchart of the experiment, simulation and analysis, discussion, and conclusion sections. The report details using MFCC and GMM algorithms for speech recognition, including signal preprocessing, parameter extraction, and training. Simulation results showed recognition rates approaching 100% for same speakers in quiet environments, but performance was impacted by noise and different speakers. The system demonstrated speech recognition capabilities in MATLAB but had limitations.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
This document analyzes speech recognition performance based on neural network layers and transfer functions. It describes a system setup that uses MFCC features and ANNs for acoustic modeling. Experiments were conducted by varying the number of neural network layers (1, 2, 3 layers) and transfer functions (linear, sigmoid, tangent sigmoid). Results showed that networks with linear transfer functions in all layers achieved the performance goal, while other configurations reached minimum gradient but not the goal. The best architecture was a 3-layer network with linear transfer functions.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
This document presents a text-dependent speaker recognition system using neural networks that aims to improve recognition accuracy. It proposes changing the number of Mel Frequency Cepstral Coefficients (MFCCs) used in training. Voice Activity Detection is also used as a preprocessing step. Experimental results show recognition accuracy increases from 70.41% to a maximum of 92.91% as the number of MFCCs increases from 14 to 20, but then decreases with more MFCCs. The system is implemented on a Raspberry Pi for hardware acceleration.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
Presence of noise increases the dimension of the information. A noise suppression algorithm is developed
with an idea of combining the General Kalman Filter and Estimate Maximization (EM) frame work.This
combination is helpful and effective in identifying noise characteristics of an acoustic environment.
Recursion between Estimate step and Maximization step enabled the algorithm to deal any model of noise.
The same Speech enhancement procedure in applied in the pre-processing stage of a conventional Speaker
identification method. Due to the non-stationary nature of noise and speech adaptive algorithms are
required. Algorithm is first applied for Speech enhancement problem and then extended to using it in the
pre-processing step of the Speaker identification. The present work is compared in terms of significant
metrics with existing and popular algorithms and results show that the developed algorithm is dominant
over them.
This document describes a speaker independent Malayalam word recognition system using support vector machines. It extracts mel frequency cepstral coefficients (MFCCs) as features and compares the performance of linear and radial basis function kernels in the SVM classifier. An overall accuracy of 91.8% was achieved on a database containing isolated word utterances from different speakers, demonstrating the effectiveness of the MFCC features and RBF kernel for speaker independent speech recognition.
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
This document discusses a method for speaker identification and verification using Mel Frequency Cepstral Coefficients (MFCC) and Support Vector Machines (SVM). MFCC is used to extract features from speech samples, representing the spectral properties of the voice. SVM is then used to match these features between speech samples to identify or verify speakers. The authors claim this method can identify speakers with up to 95% accuracy and is computationally efficient. It is proposed that MFCC and SVM outperform other techniques for speaker recognition tasks. The system is described as having applications in security, voice interfaces, and other voice-based systems.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses an approach to extract features from speech signals using Mel Frequency Cepstral Coefficients (MFCC). MFCC is a popular feature extraction technique that models human auditory perception. It involves preprocessing the speech signal, framing it, applying the discrete cosine transform to the mel scale filter bank energies, and optionally adding delta and acceleration features. MFCC extraction was tested on isolated word samples to obtain coefficients representing the short-term power spectrum. These coefficients can then be used for applications like speech recognition by matching patterns in the feature space.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
Similar to Isolated Word Recognition System For Tamil Spoken Language Using Back Propagation Neural Network Based On Lpcc Features (20)
HMM Classifier for Human Activity RecognitionCSEIJJournal
The rapid improvement in technology causes more attention towards to Recognizing of human activities
from video. These new technological growth has made vision-based research much more interesting and
efficient than ever before. This paper present novel HMM (Hidden Markov Model) based approach for
Human activity recognition from video. There are different approaches of HMM to recognize action of
human from video. Like threshold and voting to automatically and effectively segment and recognize
complex activities, segment and recognize complex activities and for simple activities we use Elman
Network (EN) and two hybrids of Neural Network (NN) and HMM, i.e. HMM-NN and NN-HMM.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Call for Articles - Computer Science & Engineering: An International Journal ...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
A Complexity Based Regression Test Selection StrategyCSEIJJournal
Software is unequivocally the foremost and indispensable entity in this technologically driven world.
Therefore quality assurance, and in particular, software testing is a crucial step in the software
development cycle. This paper presents an effective test selection strategy that uses a Spectrum of
Complexity Metrics (SCM). Our aim in this paper is to increase the efficiency of the testing process by
significantly reducing the number of test cases without having a significant drop in test effectiveness. The
strategy makes use of a comprehensive taxonomy of complexity metrics based on the product level (class,
method, statement) and its characteristics.We use a series of experiments based on three applications with
a significant number of mutants to demonstrate the effectiveness of our selection strategy.For further
evaluation, we compareour approach to boundary value analysis. The results show the capability of our
approach to detect mutants as well as the seeded errors.
XML Encryption and Signature for Securing Web ServicesCSEIJJournal
In this research, we have focused on the most challenging issue that Web Services face, i.e. how to secure
their information. Web Services security could be guaranteed by employing security standards, which is the
main focus of this search. Every suggested model related to security design should put in the account the
securities' objectives; integrity, confidentiality, non- repudiation, authentication, and authorization. The
proposed model describes SOAP messages and the way to secure their contents. Due to the reason that
SOAP message is the core of the exchanging information in Web Services, this research has developed a
security model needed to ensure e-business security. The essence of our model depends on XML encryption
and XML signature to encrypt and sign SOAP message. The proposed model looks forward to achieve a
high speed of transaction and a strong level of security without jeopardizing the performance of
transmission information.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Paper Submission - Computer Science & Engineering: An International Journal (...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Data security and privacy are important to prevent the re-
veal, modification and unauthorized usage of sensitive information. The
introduction of using critical power devices for internet of things (IoTs),
e-commerce, e-payment, and wireless sensor networks (WSNs) has brought
a new challenge of security due to the low computation capability of sen-
sors. Therefore, the lightweight authenticated key agreement protocols
are important to protect their security and privacy. Several researches
have been published about authenticated key agreement. However, there
is a need of lightweight schemes that can fit with critical capability de-
vices. Addition to that, a malicious key generation center (KGC) can
become a threat to watch other users, i.e impersonate user by causing
the key escrow problem
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Recommendation System for Information Services Adapted, Over Terrestrial Digi...CSEIJJournal
The development of digital television in Colombia has grown in last year’s, specially the digital terrestrial
television (DTT), which is an essential part to the projects of National Minister of ICT, thanks to the big
distribution and use of the television network and Internet in the country. This article explains how joining
different technologies like social networks, information adaptation and DTT, to get an application that
offers information services to users, based on their data, preferences, inclinations, use and interaction with
others users and groups inside the network.
Reconfiguration Strategies for Online Hardware Multitasking in Embedded SystemsCSEIJJournal
An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the
system has to decide which tasks are more suitable for hardware execution. In order to make an efficient
use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by
using partial dynamic reconfiguration. One of the challenges for hardware multitasking in embedded
systems is the online management of the only reconfiguration port of present FPGA devices. This paper
presents different online reconfiguration scheduling strategies which assign the reconfiguration interface
resource using different criteria: workload distribution or task’ deadline. The online scheduling strategies
presented take efficient and fast decisions based on the information available at each moment. Experiments
have been made in order to analyze the performance and convenience of these reconfiguration strategies.
Performance Comparison and Analysis of Mobile Ad Hoc Routing ProtocolsCSEIJJournal
A mobile ad hoc network (MANET) is a wireless network that uses multi-hop peer-to-peer routing instead
of static network infrastructure to provide network connectivity. MANETs have applications in rapidly
deployed and dynamic military and civilian systems. The network topology in a MANET usually changes
with time. Therefore, there are new challenges for routing protocols in MANETs since traditional routing
protocols may not be suitable for MANETs. Researchers are designing new MANET routing protocols
and comparing and improving existing MANET routing protocols before any routing protocols are
standardized using simulations. However, the simulation results from different research groups are not
consistent with each other. This is because of a lack of consistency in MANET routing protocol models
and application environments, including networking and user traffic profiles. Therefore, the simulation
scenarios are not equitable for all protocols and conclusions cannot be generalized. Furthermore, it is
difficult for one to choose a proper routing protocol for a given MANET application. According to the
aforementioned issues, this paper focuses on MANET routing protocols. Specifically, my contribution
includes the characterization of different routing protocols and compare and analyze the performance of
different routing protocols.
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
The hyperchaotic Qi system (Chen, Yang, Qi and Yuan, 2007) is one of the important models of four-
dimensional hyperchaotic systems. This paper investigates the adaptive stabilization and synchronization
of hyperchaotic Qi system with unknown parameters. First, adaptive control laws are designed to
stabilize the hyperchaotic Qi system to its equilibrium point at the origin based on the adaptive control
theory and Lyapunov stability theory. Then adaptive control laws are derived to achieve global chaos
synchronization of identical hyperchaotic Qi systems with unknown parameters. Numerical simulations
are shown to demonstrate the effectiveness of the proposed adaptive stabilization and synchronization
schemes.
An Energy Efficient Data Secrecy Scheme For Wireless Body Sensor NetworksCSEIJJournal
Data secrecy is one of the key concerns for wireless body sensor networks (WBSNs). Usually, a data
secrecy scheme should accomplish two tasks: key establishment and encryption. WBSNs generally face
more serious limitations than general wireless networks in terms of energy supply. To address this, in this
paper, we propose an energy efficient data secrecy scheme for WBSNs. On one hand, the proposed key
establishment protocol integrates node IDs, seed value and nonce seamlessly for security, then
establishes a session key between two nodes based on one-way hash algorithm SHA-1. On the other hand,
a low-complexity threshold selective encryption technology is proposed. Also, we design a security
selection patter exchange method with low-complexity for the threshold selection encryption. Then, we
evaluate the energy consumption of the proposed scheme. Our scheme shows the great advantage over
the other existing schemes in terms of low energy consumption.
To improve the QoS in MANETs through analysis between reactive and proactive ...CSEIJJournal
A Mobile Ad hoc NETwork (MANET), is a self-configuring infra structure less network of mobile devices
connected by wireless links. ad hoc is Latin and means "for this purpose". Each device in a MANET is free
to move independently in any direction, and will therefore change its links to other devices frequently. Each
must forward traffic unrelated to its own use, and therefore be a router. The primary challenge in building
a MANET is equipping each device to continuously maintain the information required to properly route
traffic. QOS is defined as a set of service requirements to be met by the network while transporting a
packet stream from source to destination. Intrinsic to the notion of QOS is an agreement or a guarantee by
the network to provide a set of measurable pre-specified service attributes to the user in terms of delay,
jitter, available bandwidth, packet loss, and so on. The analysis is mainly between proactive or table-driven
protocols like OLSR (Optimized Link State Routing) viz DSDV (Destination Sequenced Distance Vector) &
CGSR (Cluster Head Gateway Switch Routing) and reactive or source initiated routing protocols viz
AODV (Ad hoc on Demand distance Vector) & DSR (Dynamic Source Routing). The QoS analysis of the
above said protocols is simulated on NS2 and results are shown thereby.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Isolated Word Recognition System For Tamil Spoken Language Using Back Propagation Neural Network Based On Lpcc Features
1. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
DOI : 10.5121/cseij.2011.1401 1
ISOLATED WORD RECOGNITION SYSTEM FOR
TAMIL SPOKEN LANGUAGE USING BACK
PROPAGATION NEURAL NETWORK BASED ON
LPCC FEATURES
Dr.V.Radha1
, Vimala.C2
, M.Krishnaveni3
Department of Computer Science, Avinashilingam Institute for Home Science and
Higher Education for Women, Coimbatore, Tamil Nadu, India
1
radharesearch@yahoo.com
2
vimalac.au@gmail.com
3
krishnaveni.rd@gmail.com
ABSTRACT
Speech recognition has been an active research topic for more than 50 years. Interacting with the
computer through speech is one of the active scientific research fields particularly for the disable
community who face variety of difficulties to use the computer. Such research in Automatic Speech
Recognition (ASR) is investigated for different languages because each language has its specific features.
Especially the need for ASR system in Tamil language has been increased widely in the last few years. In
this paper, a speech recognition system for individually spoken word in Tamil language using multilayer
feed forward network is presented. To implement the above system, initially the input signal is
preprocessed using four types of filters namely preemphasis, median, average and Butterworth bandstop
filter in order to remove the background noise and to enhance the signal. The performance of these filters
are measured based on MSE and PSNR values. The best filtered signal is taken as the input for the further
process of ASR system. The speech features being the major part of speech recognition system, are
analyzed and extracted via Linear Predictive Cepstral Coefficients (LPCC). These feature vectors are
given as the input to the Feed-Forward Neural Network for classifying and recognizing Tamil spoken
word. Experiments are done with sample Tamil speech signals and its performance are measured based on
Mean Square Error (MSE) rate. The adopted network with the above specified parameters has produced
the best result for limited vocabulary.
KEYWORDS
Bandstop Filter, Linear Predictive Cepstral Coefficients, Feed-forward neural networks, Tamil Speech
Recognition, Mean Square Error, and Peak signal to Noise Ratio
1. INTRODUCTION
In recent years, with the new generation of computing technology, speech technology becomes
the next major innovation in man-machine interaction. Obviously such interface would yield
great benefits which can be accomplished with the help of Automatic Speech Recognition (ASR)
system. It is a process by which a machine identifies speech. It takes a human utterance as an
input and returns a string of words as output. Such research on ASR systems is primarily
developed for English language but for Indian languages it is still in earlier stage. Tamil is one of
the widely spoken languages of the world with more than 77 million speakers. Hence, there is an
2. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
2
urgent need for the system to interact with Tamil language. Recently, Neural Network (NN) [1] is
considered as one of the most successful information processing tool that has been widely used in
speech recognition [2][3]. The Multi-Layer Perceptron (MLP) have been increasingly used for
word recognition and also for other speech processing applications. The main objective of this
paper is to implement classification and recognition system for isolated Tamil [6][11]spoken
words. To carry out this task two important preprocessing [4][5] steps are done before feature
extraction[2][11] which includes filtering, [4] and windowing. Among the four filters
implemented, the best filtered speech signal is chosen and fed as an input. These filtered
outcomes are evaluated based on MSE and PSNR values. The popular feature extraction [2][10]
technique of LPCC is used for extracting specific features from the speech signal. Using LPCC,
24 feature vectors are extracted and they are given as the input to the network. Finally, the speech
recognition [2][3] is implemented using feed-forward neural networks [1]and its performances
are measured based on its default parameter MSE.
The paper is organized as follows. Section 2 gives details about the system overview. Section 3
explains the preprocessing [5] steps involved in this system. Section 4 gives details about feature
extraction technique based on LPCC. Section 5 deals with Feed forward neural network
techniques and its performance in Tamil [6][11] speech signal. Section 6 explores the
performance evaluation of the adopted method. Finally, the conclusion is summarized in section 7
with future work.
2. SYSTEM OVERVIEW
There are variety of speech recognition [2][3] approaches available such as Neural Networks,
Hidden Markov Models, Bayesian networks and Dynamic Time Warping etc. Among these
approaches Neural Networks (NNs) [1] have proven to be a powerful tool for solving problems of
prediction, classification and pattern recognition. Rather than being used in general-purpose
speech recognition applications it can handle low quality, noisy data and speaker independence
applications. Such systems can achieve greater accuracy than HMM based systems, as long as
there is training data and the vocabulary is limited.
One of the most commonly used networks based on supervised learning algorithm is multilayer
feed forward network which is implemented in this paper for classifying and recognizing Tamil
spoken words [6][11]. In Tamil language, the pronunciation of independent letters and group of
letters forming words are not different. Tamil speech recognizing system [11][11] does not
require the support of a dictionary. Thus the recognizing process in Tamil speech [11] is fairly
simple compared to English.
Figure 1. System overview of speech recognition based on NN
Input
Speech
Band Stop
Filter
Speech
frames
Feature
extraction
LPCC
Feature
Vectors
Training
vectors
Pattern
Matching using
NN
Classification
Recognition
Noise
reduction
Output
Text
3. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
3
To implement the system, initially the speech data is preprocessed using filtering, framing [4] and
windowing techniques. Subsequent to that, the enhanced signal is given as the input to the LPCC
algorithm to extract features. These feature vectors also called cepstral coeffients are given as the
input to the network. After that the network is trained with these input vectors and the target
vectors. Finally classification and recognition is done based on pattern matching. The above
figure 1 demonstrates the overall structure of the system.
3. PREPROCESSING
To enhance the accuracy and efficiency of the speech signals, they are generally pre-processed
before further analysis. The selected dataset usually needs to be pre-processed prior to being fed
into a Neural Network. In this research work, filtering, [4] and windowing are used as the
important preprocessing steps.
3.1. Filtering Techniques
One of the foremost preprocessing steps in speech processing is to apply filters to the signal. It is
essentially used for speech enhancement by reducing the background noise and to improve the
quality aspects of speech. To reduce noise in the signal four types of filters are used namely pre
emphasis, median, average and Bandstop. All these filters are implemented and based on the
experimental results, it was found that the Bandstop filter performs well among the other filters.
A Band-Stop filter [7] works to screen out frequencies that are within a certain range, giving easy
passage only to frequencies outside of that range. It is also called as band-elimination, band-
reject, or notch filters. Placing a low-pass filter in parallel with a high-pass filter can make it as
band-stop filter. The range of frequencies that a band-stop filter [7] blocks is known as the 'stop
band', which is bound by a lower cut-off frequency and a higher cut-off frequency. The frequency
of maximum attenuation in it is called the notch frequency [7]. Hence, apart from these filtes,
specifically band stop filter works better for Tamil speech signal.
Figure 2. Original signals and the resulted signals after applying filters
Original Signal Output signal
4. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
4
Both subjective and objective performance evaluations are done for the above taken filters. This
resulted signal is used for further processing of a system. The figure 2 shows the subjective
evaluation of the above four filters.
3.2. Framing and windowing
Framing [4] and windowing are the important preprocessing [4] steps in speech signal processing.
In this step, the speech signal is divided into frames of N samples, with adjacent frames being
separated by M (M < N). The first frame consists of the first N samples. The second frame
begins M samples after the first frame, and overlaps it by N - M samples and so on. This process
continues until all the speech is accounted for within one or more frames. Typical values for N
and M are 256.
The next step in the preprocessing [5] is to applying window for each individual frame so as to
minimize the signal discontinuities at the beginning and end of each frame. It is used for
minimizing the spectral distortion by using the window to taper the signal to zero at the beginning
and the end of each frame. It is defined as 1
0
),
( −
≤
≤ N
n
n
w , where N is the number of
samples in each frame and then the result of windowing is the desired signal. Windowing is done
by using the equation(1).
1
0
),
(
)
(
)
( −
≤
≤
= N
n
n
w
n
x
n
y l
l ------- (1)
Window choice is an important task in any signal processing applications. Among the different
types of windows like triangular, Blackman etc, the hamming window best suited for speech
signal processing. Typically the Hamming window is defined as in the equation (2).
1
0
,
1
2
cos
46
.
0
54
.
0
)
( −
≤
≤
−
−
= N
n
N
n
n
w
π
------- (2)
4. FEATURE EXTRACTION USING LPCC
The motivation of feature extraction [2] [11] is to convert speech waveform to a parametric
representation at a lower information rate for further analysis. Speech signals have non-stationary
characteristics. As a result, the speech waveforms are commonly small frames (typically 5 ms to
40 ms) in which the signal characteristics are considered quasi-stationary to allow for short-term
spectral analysis and feature extraction. A wide range of feature extraction techniques are
available such as Linear Predictive Coefficients (LPC), Linear Predictive Cepstral Coefficients
(LPCC) and Mel Frequency Cepstral Coefficients (MFCC). In this research work the most
frequently used LPCC parameters are considered to determine the best feature set for the
Tamil[6][11] speech database.
LPC has been widely used in speech recognition systems. Linear predictive analysis of speech
has become the predominant technique for estimating the basic parameters of speech. The basic
idea behind linear predictive analysis is that a speech sample can be approximated as a linear
combination of past speech samples. Through minimizing the sum of squared differences (over a
finite interval) between the actual speech samples and predicted values, a unique set of
parameters or predictor coefficients can be determined. The predictor coefficients are therefore
transformed to a more robust set of parameters known as cepstral coefficients. These coefficients
form the basis for linear predictive analysis of speech. The analysis provides the capability for
computing the linear prediction model of speech over time.
Feature vector sequence for each frame is usually obtained from the LP parameters. To extract the
LPC features, the speech signal is blocked into frames of N samples and each frame is multiplied
5. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
5
by hamming window. Short term autocorrelation analysis is performed to find the important
parameter of frame energy for speech-detection. After that coefficients based on Levinson-Durbin
recursion are extracted and then converted to Q cepstral coefficients, which are weighted by a
raised sine window. Finally the observation vectors are derived. In this work, the feature vector is
composed of 12 cepstral coefficients and 12 difference cepstral coefficients. These feature vectors
are fed as an input to the further processing of classification and feature matching technique. The
steps involved in LPCC can be seen in the following figure 3 which illustrates the functioning
process of deriving the Linear Predictive Cepstral Coefficients.
Figure 3. Steps involved in LPCC Feature Extraction
The following figure 4 and table 1 shows the 24 feature vectors derived from LPCC for the
sample Tamil [11][11]speech signals. These coeffients are taken as the input vector for the
network.
Figure 4. LPCC Feature Vectors
The extracted 24 feature vectors are as follows.
Frame Blocking Windowing
Auto Correlation
analysis
LP analysis based
on Levinson-
Durbin recursion
Speech
Signal
LPC feature
vectors
7. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
7
5. FEED FORWARD NEURAL NETWORK FOR TAMIL WORD
RECOGNITION
A feed forward neural network [11] is a biologically inspired classification algorithm which falls
under the category, "Networks for Classification and Prediction" and has widespread interesting
applications and functions related to speech processing. It consists of a (possibly large) number of
simple neuron-like processing units, organized in layers. Every unit in a layer is connected with
all the units in the previous layer. These connections are not all equal and may have a different
strength or weight. The weights on these connections encode the knowledge of a network. They
usually consist of three to four layers in which the neurons are logically arranged. The first and
last layers are the input and output layers respectively and there are usually one or more hidden
layers in between the other layers. The term feed-forward means that the information is only
allowed to "travel" in one direction. This means that the output of one layer becomes the input of
the next layer, and so forward. Feed-forward networks are advantageous as they have the fastest
models to execute.
In the Network, the input layer does have a part in calculation, rather than just receiving the
inputs. The raw data is computed, and activation functions are applied in all layers. This process
occurs until the data reaches the output neurons, where it is classified into a certain category. The
operation of this network can be divided into two phases:
1. The learning phase
2. The classification phase
During the learning phase the weights in the FFNet will be modified. All weights are modified in
such a way that when a pattern is presented, the output unit with the correct category, hopefully,
will have the largest output value. In the classification phase the weights of the network are fixed.
A pattern, presented as the input will be transformed from layer to layer until it reaches the output
layer. Now classification can occur by selecting the category associated with the output unit that
has the largest output value. In contrast to the learning phase classification is very fast. The
experimental results are presented in the following section.
6. PERFORMANCE EVALUATION
Performance evaluation is done for both filtering techniques and the Neural Network separately.
Mean Square Error Rate (MSE) and Peak Signal to Noise Ratio (PSNR) values are used as the
performance measures to find out the best filtering signal for feature extraction. The MSE value is
calculated by the equation (3)
i i
y y
MSE
n p
∧
∑ −
=
−
---------- (3)
and PSNR value is calculated by the equation (4)
2
10
10log
R
PSNR
MSE
=
---------- (4)
The average values of 10 speech samples were taken for the performance evaluation. The figure 5
(a) and (b) presents the performance evaluation of four filters based on MSE and PSNR values.
Among these filters Bandstop filter offer least MSE value and high peak signal value.
8. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
8
Figure 5 (a). Performance evaluation of filters based on MSE
Figure 5 (b). Performance evaluation of filters based on PSNR
In this research work, the input speech signal is preprocessed with Bandstop filter. This filtered
signal is divided into short frames and hamming window is applied to reduce the discontinuities
between these frames. For further process of creating and training a network, the values given for
the input vector are 24 coefficients of LPCC. There are 24 neurons in the first layer and 2 neurons
in the second (output) layer. The transfer function used in the first layer is tan-sigmoid, and the
output layer transfer function is linear. With standard back-propagation, the learning rate is held
constant throughout training. The performance of the algorithm is very sensitive to the proper
setting of the learning rate. If the learning rate is set too high, the algorithm may oscillate and
become unstable. If the learning rate is too small, the algorithm will take too long to converge.
However this sensitivity can be improved if it allows the learning rate to change during the
training process. For this research work the learning rate of 0.03 is used and epoch count was
9. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
9
limited to 300. Different epoch was employed in the network in order to find the performance.
The experiments reveal that as the number of epoch increases, the error rate was minimized and
the performance was improved. The following figure 6 (a) and (b) shows the experimental results
of actual and predicted value of a sample Tamil speech[11][11] signal. The performance goal is
achieved with the parameters specified for the network.
Figure 6(a). actual and predicted value
Figure 6(b). performance of the Network
The default performance evaluation measure for the adopted network is mean square error rate. It
is computed by "summing the squared differences between the predicted values vs. actual value,
and then divide the sum by the number of components. A threshold of 0.01 mean square error is
good. Our research work achieved minimum MSE for the Tamil speech [11] signal i.e
0.00515/0.01.
10. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
10
7. CONCLUSION
In recent years, neural network has become an enhanced technique for tackling complex problems
and tedious tasks such as speech recognition. Speech is a natural and simple communication
method for human beings. However, it is an extremely complex and difficult job to make a
computer respond to spoken commands. Recently there is a momentous need for ASR system to
be developed in Tamil and other Indian languages. In this paper such an important effort is
carried out for recognizing Tamil spoken words. To accomplish this task, feature extraction is
done after employing required preprocessing techniques. The most widely used LPCC method is
used to extract the significant feature vectors from the enhanced speech signal and they are given
as the input to the feed forward neural network. The adopted network is trained with these input
and target vectors. The results with the specified parameters were found to be satisfactory
considering less number of training data. More number of isolated and continuous words to be
trained and tested with this network in future. This preliminary experiment will helps to develop
ASR system for Tamil language using different approaches like Hidden Markov Models or with
other hybrid techniques.
ACKNOWLEDGEMENTS
The authors thank UGC Major Research Project, New Delhi, India, for funding the project on
“Automatic Tamil Voice Recognition Browser using Continuous Hidden Markov Model for
Visually impaired People”.
REFERENCES
[1] Abdul Manan Ahmad', Saliza Ismail+, Den Fairol SamaonL(2004) “Recurrent Neural Network with
Backpropagation through Time for Speech Recognition’, Intemational Symposium on
Communications and Information Technologies 2004 (ISClT2004), Sapporo, Japan, October 26- 29.
[2] Amin Ashouri Saheli, Gholam Ali Abdali, Amir Abolfazl suratgar,(2009) “Speech Recognition from
PSD using Neural Network”, Proceedings of the International MultiConference of Engineers and
Computer Scientists 2009, Vol I,IMECS 2009, March 18 - 20, 2009, Hong Kong.
[3] S K Hasnain, Azam Beg(2008), “A Speech Recognition System for Urdu Language”, in International
Multi-Topic Conference (IMTIC'08), Jamshoro, Pakistan, 2008, pp. 74-78.
[4] S K Hasnain, (2008) “Recognizing Spoken Urdu Numbers Using Fourier Descriptor and Neural
Networks with Matlab”, Second International Conference on Electrical Engineering,25-26 March
2008,University of Engineering and Technology, Lahore (Pakistan) (2008)
[5] Patricia Melin, Jerica Urias, Daniel Solano, Miguel Soto, Miguel Lopez, and Oscar Castillo, (2006)
“Voice Recognition with Neural Networks”, Type-2 Fuzzy Logic and Genetic Algorithms,
Engineering Letters, 13:2, EL_13_2_9 (Advance online publication: 4 August).
[6] Muthanantha murugavel, (2007) “Speech Recognition Model for Tamil Stops, Proceedings of the
World Congress on Engineering 2007 Vol I, WCE 2007, July 2 - 4, 2007, London, U.K.
[7] Dr.V.Radha,Ms.Vimala.C and Ms.M.Krishanveni, (2010) “Reconstruction Methodology for Tamil
Speech Recognition System”, Proceedings of International conference on computing (ICC2010),
Advanced Computing Research Society, Institute for Defense Studies and Analysis in New Delhi, pp
152-157,ISBN:978-81-920305-1-7.
[8] S. Saraswathi1, T.V. Geetha, (2010) “Design of language models at various phases of Tamil speech
recognition system”, International Journal of Engineering, Science and Technology, Vol. 2, No. 5,
2010, pp. 244-257.
[9] S. Saraswathi and T.V. Geetha, (2004) “Implementation of Tamil Speech Recognition System Using
Neural Networks”, Lecture Notes in Computer Science, 2004, Volume 3285/2004, 169-176, DOI:
10.1007/978-3-540-30176-9_22.
11. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.4, October 2011
11
[10] R. Thangarajan, (2008) “Word and Triphone Based Approaches in Continuous Speech Recognition
for Tamil Language”, WSEAS TRANSACTIONS on SIGNAL PROCESSING, Issue 3, Volume 4,
March 2008.
[11] N.Uma Maheswari, A.P.Kabilan, R.Venkatesh, (2010) “A Hybrid model of Neural Network
Approach for Speaker independent Word Recognition”, International Journal of Computer Theory
and Engineering, Vol.2, No.6, December, 2010,1793-8201.
Authors
Dr.V.Radha, is the Associate Professor of Department of Computer Science,
Avinashilingam Institute for Home Science and Higher Education for Women,
Coimbatore, TamilNadu, India. She has more than 21 years of teaching experience and 7
years of Research Experience. Her Area of Specialization includes Image Processing,
Optimization Techniques, Voice Recognition and Synthesis, Speech and signal
processing and RDBMS. She has more than 45 Publications at national and International
level journals and conferences. She has one Major Research Project funded by UGC. Email id:
radhasrimail@gmail.com
Ms.Vimala.C, currently doing Ph.D and working as a Project Fellow for the UGC
Major Research Project in the Department of Computer Science, Avinashilingam
Institute for Home Science and Higher Education for Women. She has more than 2
years of teaching experience and 1 year of research experience. Her area of
specialization includes Speech Recognition and Synthesis. She has 9 publications at
National and International level conferences.
Email id: vimalac.au@gmail.com
Ms. M. Krishnaveni, 5 Years of Research Experience. Working as Research Assistant
in Naval Research Board project, Area of Specialization: Image Processing, Pattern
recognition, Neural Networks. She has 40 publications at national and International
level journals and conferences. She has one Major Research Project funded by UGC
and one under DST. Email id:krishnaveni.rd@gmail.com.