Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Real Time Speaker Identification System – Design, Implementation and ValidationIDES Editor
This paper presents design, implementation and
validation of a PC based Prototype speaker recognition and
verification system. This system is organized to receive speech
signal, find the features of speech signal, and recognize and
verify a person using voice as the biometric. The system is
implemented to capture the speech signal from microphone
and to compare it with the stored data base using filter-bank
based closed-set speaker verification system. At first, the
identification of the voice signals is done using an algorithm
developed in MATLAB. Next, a PC based prototype system is
developed and is validated in real time. Several tests were made
on different sets of voice signals, and measured the performance
and the speed of the proposed system in real environment. The
result confirmed the use of proposed system for various real
time applications.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
One of the common and easier techniques of feature extraction is Mel Frequency Cestrum Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by Dynamic Feature Extraction and provide high performance rate when compared to previous technique like LPC. But one of the major drawbacks in this technique is robustness. Another feature extraction technique is Relative Spectral (RASTA). In effect the RASTA filter band passes each feature coefficient and in both the log spectral and the Spectral domains appear linear channel distortions as an additive constant. The high-pass portions of the equivalent band pass filter effect the convolution noise introduced in the channel. The low-pass filtering helps in smoothing frame to frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces the impact of the noise in signals and provides high robustness
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using backpropagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Real Time Speaker Identification System – Design, Implementation and ValidationIDES Editor
This paper presents design, implementation and
validation of a PC based Prototype speaker recognition and
verification system. This system is organized to receive speech
signal, find the features of speech signal, and recognize and
verify a person using voice as the biometric. The system is
implemented to capture the speech signal from microphone
and to compare it with the stored data base using filter-bank
based closed-set speaker verification system. At first, the
identification of the voice signals is done using an algorithm
developed in MATLAB. Next, a PC based prototype system is
developed and is validated in real time. Several tests were made
on different sets of voice signals, and measured the performance
and the speed of the proposed system in real environment. The
result confirmed the use of proposed system for various real
time applications.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
One of the common and easier techniques of feature extraction is Mel Frequency Cestrum Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by Dynamic Feature Extraction and provide high performance rate when compared to previous technique like LPC. But one of the major drawbacks in this technique is robustness. Another feature extraction technique is Relative Spectral (RASTA). In effect the RASTA filter band passes each feature coefficient and in both the log spectral and the Spectral domains appear linear channel distortions as an additive constant. The high-pass portions of the equivalent band pass filter effect the convolution noise introduced in the channel. The low-pass filtering helps in smoothing frame to frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces the impact of the noise in signals and provides high robustness
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using backpropagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Combined feature extraction techniques and naive bayes classifier for speech ...csandit
Speech processing and consequent recognition are important areas of Digital Signal Processing
since speech allows people to communicate more natu-rally and efficiently. In this work, a
speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing
speech, features are to be ex-tracted from speech and hence feature extraction method plays an
important role in speech recognition. Here, front end processing for extracting the features is
per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and
Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose.
After classification using Naive Bayes classifier, DWT produced a recognition accuracy of
83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new
feature extraction method which produces improvements in the recognition accuracy. So, a new
method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes
the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated
and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...cscpconf
Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a
speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays animportant role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose.After classification using Naive Bayes classifier, DWT produced a recognition accuracy of83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes
the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
This paper describes the development of an efficient speech recognition system using different techniques such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ) and Hidden Markov Model (HMM).
This paper explains how speaker recognition followed by speech recognition is used to recognize the speech faster, efficiently and accurately. MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. Then HMM is used on Quantized feature vectors to identify the word by evaluating the maximum log likelihood values
for the spoken word.
Isolated word recognition using lpc & vector quantizationeSAT Journals
Abstract Speech recognition is always looked upon as a fascinating field in human computer interaction. It is one of the fundamental steps towards understanding human recognition and their behavior. This paper explicates the theory and implementation of Speech recognition. This is a speaker-dependent real time isolated word recognizer. The major logic used was to first obtain the feature vectors using LPC which was followed by vector quantization. The quantized vectors were then recognized by measuring the Minimum average distortion. All Speech Recognition systems contain Two Main Phases, namely Training Phase and Testing Phase. In the Training Phase, the Features of the words are extracted and during the recognition phase feature matching Takes place. The feature or the template thus extracted is stored in the data base, during the recognition phase the extracted features are compared with the template in the database. The features of the words are extracted by using LPC analysis. Vector Quantization is used for generating the code books. Finally the recognition decision is made based on the matching score. MATLAB will be used to implement this concept to achieve further understanding. Index Terms: Speech Recognition, LPC, Vector Quantization, and Code Book.
Classification improvement of spoken arabic language based on radial basis fu...IJECEIAES
The important task in the computer interaction is the languages recognition and classification. In the Arab world, there is a persistent need for the Arabic spoken language recognition To help those who have lost the upper parties in doing what they want through speech computer interaction. While, the Arabic automatic speech recognition (AASR) did not receive the desired attention from the researchers. In this paper, the Radial Basis Function(RBF) is used for the improvement of the Arabic spoken language letter. The recognition and classification process are based on three steps; these are; preprocessing, feature extraction and classification (Recognition). The Arabic Language Letters (ALL) recognition is done by using the combination between the statistical features and the Temporal Radial Basis Function for different letter situation and noisy condition. The recognition percent are from 90% - 99.375% has been gained with independent speaker, where these results are over-perform the earlier works by nearly 2.045%. The simulati.on has been made by using Matlab 2015b.
In the era of data-driven warfare, the integration of big data and machine learning (ML) techniques has
become paramount for enhancing defence capabilities. This research report delves into the applications of
big data and ML in the defence sector, exploring their potential to revolutionize intelligence gathering,
strategic decision-making, and operational efficiency. By leveraging vast amounts of data and advanced
algorithms, these technologies offer unprecedented opportunities for threat detection, predictive analysis,
and optimized resource allocation. However, their adoption also raises critical concerns regarding data
privacy, ethical implications, and the potential for misuse. This report aims to provide a comprehensive
understanding of the current state of big data and ML in defence, while examining the challenges and
ethical considerations that must be addressed to ensure responsible and effective implementation.
Cloud Computing, being one of the most recent innovative developments of the IT world, has been
instrumental not just to the success of SMEs but, through their productivity and innovative contribution to
the economy, has even made a remarkable contribution to the economic growth of the United States. To
this end, the study focuses on how cloud computing technology has impacted economic growth through
SMEs in the United States. Relevant literature connected to the variables of interest in this study was
reviewed, and secondary data was generated and utilized in the analysis section of this paper. The findings
of this paper revealed that there have been meaningful contributions that the usage of virtualization has
made in the commercial dealings of small firms in the United States, and this has also been reflected in the
economic growth of the country. This paper further revealed that as important as cloud-based software is,
some SMEs are still skeptical about how it can help improve their business and increase their bottom line
and hence have failed to adopt it. Apart from the SMEs, some notable large firms in different industries,
including information and educational services, have adopted cloud computing technology and hence
contributed to the economic growth of the United States. Lastly, findings from our inferential statistics
revealed that no discernible change has occurred in innovation between small and big businesses in the
adoption of cloud computing. Both categories of businesses adopt cloud computing in the same way, and
their contribution to the American economy has no significant difference in the usage of virtualization.
Energy-constrained Wireless Sensor Networks (WSNs) have garnered significant research interest in
recent years. Multiple-Input Multiple-Output (MIMO), or Cooperative MIMO, represents a specialized
application of MIMO technology within WSNs. This approach operates effectively, especially in
challenging and resource-constrained environments. By facilitating collaboration among sensor nodes,
Cooperative MIMO enhances reliability, coverage, and energy efficiency in WSN deployments.
Consequently, MIMO finds application in diverse WSN scenarios, spanning environmental monitoring,
industrial automation, and healthcare applications.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication. IJCSIT publishes original research papers and review papers, as well as auxiliary material such as: research papers, case studies, technical reports etc.
With growing, Car parking increases with the number of car users. With the increased use of smartphones
and their applications, users prefer mobile phone-based solutions. This paper proposes the Smart Parking
Management System (SPMS) that depends on Arduino parts, Android applications, and based on IoT. This
gave the client the ability to check available parking spaces and reserve a parking spot. IR sensors are
utilized to know if a car park space is allowed. Its area data are transmitted using the WI-FI module to the
server and are recovered by the mobile application which offers many options attractively and with no cost
to users and lets the user check reservation details. With IoT technology, the smart parking system can be
connected wirelessly to easily track available locations.
Welcome to AIRCC's International Journal of Computer Science and Information Technology (IJCSIT), your gateway to the latest advancements in the dynamic fields of Computer Science and Information Systems.
Computer-Assisted Language Learning (CALL) are computer-based tutoring systems that deal with
linguistic skills. Adding intelligence in such systems is mainly based on using Natural Language
Processing (NLP) tools to diagnose student errors, especially in language grammar. However, most such
systems do not consider the modeling of student competence in linguistic skills, especially for the Arabic
language. In this paper, we will deal with basic grammar concepts of the Arabic language taught for the
fourth grade of the elementary school in Egypt. This is through Arabic Grammar Trainer (AGTrainer)
which is an Intelligent CALL. The implemented system (AGTrainer) trains the students through different
questions that deal with the different concepts and have different difficulty levels. Constraint-based student
modeling (CBSM) technique is used as a short-term student model. CBSM is used to define in small grain
level the different grammar skills through the defined skill structures. The main contribution of this paper
is the hierarchal representation of the system's basic grammar skills as domain knowledge. That
representation is used as a mechanism for efficiently checking constraints to model the student knowledge
and diagnose the student errors and identify their cause. In addition, satisfying constraints and the number
of trails the student takes for answering each question and fuzzy logic decision system are used to
determine the student learning level for each lesson as a long-term model. The results of the evaluation
showed the system's effectiveness in learning in addition to the satisfaction of students and teachers with its
features and abilities.
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This research aims to further understanding in the field of continuous authentication using behavioural
biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing
Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed
machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KNN), and
Support Vector Classifier (SVC), to determine the authenticity of specific user actions. Our most robust
model was SVC, which achieved an average accuracy of approximately 90%, demonstrating that touch
dynamics can effectively distinguish users. However, further studies are needed to make it viable option
for authentication systems. You can access our dataset at the following
link:https://github.com/AuthenTech2023/authentech-repo
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
The security of Electric Vehicle (EV) charging has gained momentum after the increase in the EV adoption
in the past few years. Mobile applications have been integrated into EV charging systems that mainly use a
cloud-based platform to host their services and data. Like many complex systems, cloud systems are
susceptible to cyberattacks if proper measures are not taken by the organization to secure them. In this
paper, we explore the security of key components in the EV charging infrastructure, including the mobile
application and its cloud service. We conducted an experiment that initiated a Man in the Middle attack
between an EV app and its cloud services. Our results showed that it is possible to launch attacks against
the connected infrastructure by taking advantage of vulnerabilities that may have substantial economic and
operational ramifications on the EV charging ecosystem. We conclude by providing mitigation suggestions
and future research directions.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm
for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a
software framework used with Apache Hadoop, which has become the de facto standard platform for
processing and storing large amounts of data in a distributed computing environment. The research
presented here focuses on the variations observed among the results of an efficient iterative transitive
closure algorithm when run against different distributed environments. The results from these comparisons
were validated against the benchmark results from OYSTER, an open source Entity Resolution system. The
experiment results highlighted the inconsistencies that can occur when using the same codebase with
different implementations of Map Reduce.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL APPROACH
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
DOI: 10.5121/ijcsit.2018.10304 57
CURVELET BASED SPEECH RECOGNITION
SYSTEM IN NOISY ENVIRONMENT:
A STATISTICAL APPROACH
Nidamanuru Srinivasa Rao1
, Chinta Anuradha2
, Dr. SV Naga Sreenivasu3
1
Scholar, Department of CSE, Acharya Nagarjuna University, Guntur, AP
2
Asst.Prof, Department of CSE, VR Siddhartha Engineering College, Vijayawada, AP
3
Professor, Dept. of CS, Narasaraopeta, Guntur, AP
ABSTRACT
Speech processing is considered as crucial and an intensive field of research in the growth of robust and
efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of
context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature
Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is
decomposed into different frequency channels using the characteristics of curvelet transform for reduce the
computational complication and the feature vector size successfully and they have better accuracy, varying
window size because of which they are suitable for non –stationary signals. For better word classification
and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification
rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The
objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are
compared along with their merits and demerits. The statistical results shows that signal recognition
accuracy will be increased by using discrete Curvelet transforms over conventional methods.
KEYWORDS
Speech Signals, Curvelets, HMM, Bayesian Networks, Recognition System.
1. INTRODUCTION
Automatic Speech Recognition (ASR) is the capability of machines to recognize words and
phrase from spoken language and translate them into machine readable set-up [2]. The major
interest of speech recognition is to build up methods and system for speech input signal to
machine. Speech is the prime that means communication between humans of this medium
motivates research efforts to permit speech to become a feasible human computer interaction [3].
Speech signals are non-stationary in nature; speech recognition is a complex job due to the
multiple features like gender, pronunciation, emotional state, accent, articulation, nasality, pitch,
and volume. Due to the background noise and other types of disturbances also makes a speech
processing process is too complex. The throughput of a speech processing process is measured in
terms of detection accuracy [4]. The fast evolution of computer hardware, software and speech
recognition method is fairly turned as major expertise in processing of computer information. It is
generally used in voice-activated telephone exchange, weather forecasting, banking services,
computer aided language learning, video games, transcription, robotics, audio and video search,
household applications and language learning applications etc [4].
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
58
Speech recognition system is separated into two segments, feature extraction and feature
classification. Feature extraction method acts a crucial role in speech recognition process. Isolated
word/sentence recognition requires extraction of characters from the recorded utterances followed
by a training period [6]. The most extensively used feature extraction methods are Discrete
Fourier Transform (DCT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform
(DWT), Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFFCs),
RASTA filtering and Hybrid Algorithm DWPD. A relatively new technique used in the field of
signal processing for feature extraction is the curvelet technique. Classification methods used in
speaking identification process, it include Gaussian Mixture Models (GMMs), Vector
Quantization (VQ), HMMs and ANNs [6]. The current speech recognition scheme is based on the
Hidden Markov Models for the spectral features of speech frame. The main reason why HMM is
well-known and widely used is that HMM has parameters that can be automatically learned or
trained and the techniques that are used for learning are easy and are computationally feasible to
use. The following Figure 1 describes the primary task of speech identification system in
effortless equations which consist of feature extraction, database, network training, testing and
decoding. The detection process is shown below:
Figure 1. Basic model of speech recognition system
In this research, a novel method for speaking recognition is presented. This method is based on
the extraction of the curvelet parameters from the original speech signal, curvelet coefficients are
calculated for clean and noisy speech signals respectively. A Hidden Markov Model is used for
classification and recognition. The objective of this method is to enhance the better performance
of the new method for feature extraction over traditional approaches.
The rest of the paper is organized as follows. Section 2 discusses the Speech Recognition System.
Curvelet Transform Technique for feature extraction is discussed in Section 3. Section 4
discusses the Hidden Markov Model for classification and pattern recognition. In Section 5, the
proposed methodology is introduced and also gives the statistical results. Finally, Section 6
summarizes the concluding remarks.
2. SPEECH RECOGNITION SYSTEM
Speech recognition is the system for automatically detecting the spoken words of person based on
data in speech signal. The progress of speech recognition system needs concentration in defining
dissimilar types of speech classes, feature extraction methods, speech signal classifiers and
performance evaluation. The major criterion for the good quality speech identification system is
the selection of feature extraction method which acting a major role in accuracy of the system.
Speech recognition process can be separated into dissimilar classes based on type of speech
utterances are capable to recognize. Mathematically Speech signal system is defined as follow:
Let the clean speech y (e) is degraded by additive noise u (e), so the corrupted signal x (e) can be
expressed as
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
59
The objective of speech restoration is to estimate x (e) from y (e) when there is little or no prior
knowledge of u (e). The continuous speech waveform is first divided into frames. For each
spoken word, let O = o1o2 · · · oτ be a sequence of parameters vectors, where ot is observed at
time t ∈{1 . . . ∈τ}. Given a dictionary D with words wi D, the recognition problem is
Where W˜ is the recognised word, P is the probability measure and W = w1 ..... wk a word
sequence and transform (1) in a suitable calculable form as:
Where P(O|W) represents the acoustic model and P(W) the language model; P(O) can be
unheeded. Thus, for a specified set of prior probabilities P(W), the probable spoken words based
only on the probability P(O|wi). On Simplification (1) becomes
In the log domain, the total likelihood is calculated as
Where |W| is the length of the word sequence W, s is the language design scale factor (LMSF)
and p the word insertion penalty (WIP).
Different classes of speech recognition system are i).Isolated Words: These systems require each
utterance to contain lack of an audio signal on both sides of constructed sample window. It
accepts sole word at a specified time. ii) Connected Words: It permit separate utterances to be
spoken together with a minimum pause between them. iii).Continuous Speech: these systems
allow people to speak almost in a natural way. iv). Continuous speech recognizers are hard to
make as they need extra effort to determine word boundaries. v). Spontaneous Speech: Speech
recognition process with spontaneous speech should be able to handle different natural speech
character such as words run together with slight stutters.
Figure 2. A Basic Model of Speech Recognition System
Figure 2 describes a fundamental model for speech recognition system; it represents different
stages of a system consist of pre-processing, feature extraction and classification. The pre-
processing stage indicated the input signal. After pre-processing, feature extraction stage extracts
necessary vectors for modelling stage. These vectors necessity be robust for better accuracy.
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
60
Classification stage recognize the speech text using extracted features, where it contains syntax
and semantics related to language accountable that helps classifier to recognize the input [1].
3. CURVELET BASED FEATURE EXTRACTION
Feature extraction is a central phase in speech computation and it is responsible for converting the
speech signals into stream of feature vectors coefficients. These coefficients consist of
information which is necessary for the identification of a given utterance. Every speech has
different unique attributes contained in spoken words. These attributes can be extracted from a
wide range of feature extraction methods and employed for speech recognition process [7]. The
feature vector of speech signals are naturally extracted using spectral analysis methods such as
Mel- frequency cepstral coefficients, linear predictive coding, curvelet and wavelet transforms
etc. Using curvelets, the computational complexity, the feature vector size are reduced, gives
better accuracy, varying window size because of which they are suitable for non –stationary
signals. Thus curvelet transform is an elegant tool for the analysis of non-stationary signals like
speech. Aamong all these techniques curvelet transform gives better system accuracy.
Discrete Curvelet transform (DCT) has good signal properties, it is applicable for many real
signals and it is also computationally efficient. To obtain a time-scale representation of the
signals DCT uses digital filtering. Let f (x1, x2), be function y1 = 0, 1,..., N1 - 1, x2 = 0, 1,..., N2
- 1, based on this function, we define the discrete Fourier transform [7].
The following function for discrete curvelet transform is not integrated with curvelet coefficients.
Where k= (k1, k2), s is the curvelet on point j with direction l and spatial shift k.
In DCT, original signal passes through two filters such as a low-pass filter and a high-pass filter
and emerges as two signals. The output of a low pass filter is called as approximation coefficients
and the outcome of high-pass filter as detail coefficients. In speech signals, the little frequency
elements characterize a signal more than its high frequency elements and thus the low frequency
elements h[m] of superior than high frequency signals g[m]. The following high pass and low
pass filtering of the signal by the following equations [5].
Where Xhigh are the detail coefficients and Xlow are the approximation coefficients which are the
outcomes of the high pass and low pass filters get by sub sampling by factor 2. This filtering
event is continued until the necessary level is get according to Mallat algorithm. The DCT
decomposition tree and pseudocde is given in figure 3 [5].
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
61
Figure 3. DCT Decomposition Tree
function y = curvelet ( st, R, md )
if exist( 'md', 'var' )
md = 0;
end
l = size( st, 1 );
if md == 0
zp = kkt_iso_idwt( st(1,:), R, md );
else if md == 1
zp = kkt_iso_idwt( st{1}, R, md );
end
for ii = 2:l
if md == 0
zp (ii,:) = kkt_iso_idwt( rt(ii,:), R, md );
else if md == 1
zp (ii,:) = kkt_iso_idwt( st{ii}, R, md );
end
end
xr = rectopolar_2_cart( yp );
y = i_kkt_2( yr );
[l1 l2] = size( y );
y = y(1:l1-1,1:l2-1);
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
62
4. PATTERN CLASSIFICATION AND RECOGNITION
There exist three basic issues of HMM, for solving these issues using models in real world
applications. The primary issue is evaluation problem; it searches to process the probability Q
(O/λ), the observation sequence O was produced by the model λ. This probability can be obtained
by using forward propagation. Recursively, it estimates the forward variable:
Then, Q(O/λ) = is getting by summation of end point forward variables and the
backward propagation can be used to find the solution of this issue. Unlike forward, the backward
propagation goes backward. At each instant, it calculates the backward variable:
Finally, is obtained by combining the forward and backward variable.
The second problem is named the decoding problem. It searching for predicts state sequence S, it
is generated by O. The Viterbi [8] algorithm provides the solution for this issue. It starts from the
preliminary instant t = 1, for every moment t, it measures δt(i) for every state i, then it keeps the
state which have the maximum δt = maxp1,p2,...,pt−1 Q (p1, p2, . . . pt−1, pt = i, O1O2......Ot-1 |
λ = max 1 ≤ i≤ M (δt−1 (i) bij) aj (Ot). When, the algorithm reaches the past instance t = T, it
keeps the state with maximum δT. Finally, Viterbi algorithm back-track the sequence of states as
the pointer in every instance indicates with t. The current issue is the learning, it seeks to control
the model parameters in order to take benefit of Q (O | λ). Baum-Welch [8] method is extensively
used for current issue. This algorithm uses for re-estimation of forward and backward variables of
model parameters. The principal apparatus of a vast vocabulary working for continuous speech
recogniser are described in Figure 4.
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
63
Figure 4. Architecture of HMM-based Recognizer
The given input audio waveform from a microphone is converted into sequence of flat size
acoustic vectors Y1:E = y1,...,yE. Then the decoder attempts to discover the sequence of words
w1:L = w1,...,wL which is most likely to have generated by Y , i.e. the decoder tries to find the
following formula
wˆ = arg max w {P(w|Y )}…………………………… (15)
Since P (w|Y ) is not easy to connect model directly, Bayes’ Rule is used for transformation (15)
into the equivalent problem for finding solution:
wˆ = arg max w {p(Y |w) P (w)}………………………...(16)
The likelihood p(Y |w) is strong-minded by an acoustic model and the prior P(w) is represented
by a language model. The basic unit of noise determined by the auditory model is the phone.
The above equations are used for the speech identification process generated by model. Suppose
S represented as speech signal in the system, S consists of the path in the syntactic network. The
principal stage is to convert S into a sequence of acoustic vectors using the same feature
extraction method used for training and we obtain sequence of observations O. The most likely
path enhanced the probability of observing O, this model Q (O/ λ). This probability can be
discover either by using the forward algorithm or Viterbi algorithm. For any structure employs
HMM method three fundamental algorithms which are evaluation, training, and classification
algorithms. In classification algorithm, the recognition process is enabled for any unknown
utterance by identifying the unidentified observations sequence via choosing the most likely class
to have generated the observation sequence. In training algorithm, the model is responsible to
store data collected for a specific language. In the evaluation algorithm, the probability of an
observation sequence is computed for matching processes. The classification algorithm was
employed for a given observations O = O1, O2, O3 … OE. A chosen class was computed using
the following equations [10]:
Therefore, by applying Bayesian rule to find Q (Nclass| O] the probability was computed as
In the recognition process, probability of each word was calculated in order to match the
occurrence of specific word with another one in the vocabulary table. The elements of the study,
these elements sequence are denoted by Oe and elements of the state sequence are denoted by Se.
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
64
Using this structure, we can characterize the system with a set of state transition probabilities aij =
Pr{transaction state i to j} and state dependent observation probabilities aj(Oe) =Qr{Oe|Se = j}. In
the case that the observations come from a discrete and finite set U of size N, we can characterize
the conditional probability of observing each element of that set as bjk = {Ot = Uk}where Uk = U.
Based on these definitions, an HMM with n states can be characterized by a set of parameters λ =
{ , B, A}. Where Bi = [b] is the M x M state transaction matrix, A = [aij] is the M x N
observation matrix, and = (1,0,…0) is the 1x M initial state vector. In this concern set of
parameters represented by λ, these parameters using for forward-backward algorithm as shown in
(19), (20) and (21). It can be used to efficiently process the probability of an observation
sequence of length E. The forward probabilities and backward probabilities are represented as
The Baum-Welch algorithm provide the set of re-estimation formulas shown below (22) and (23),
these formulas denotes a novel estimation of that parameter.
The following algorithm for HMM based IOS used for finding best sequence of hidden states.
Figure 5. HMM algorithm IOS used for finding optimal sequence of hidden states
The function viterbi (observations of len E, state-graph of len M) returns best-path create a path
probability matrix viterbi[N+2,T] for each state s from 1 to M do ; initialization step
viterbi[s,1]←a0,s ∗ bs(o1) backpointer[s,1]←0 for each time step t from 2 to E do ; recursion step
for each state s from 1 to M do viterbi[s,e]← M max s0=1 viterbi[s0 ,e −1] ∗ as,0 ,s ∗ bs(oe)
backpointer[s,e]← Margmax s0=1 viterbi[s0 ,t −1] ∗ as 0 ,s viterbi[qF ,E]← M max s=1 viterbi [s,
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
65
E] ∗ as,qF ; termination step backpointer[qF ,E]← N argmax s=1 viterbi[s,E] ∗ as,qF ; termination
step return the backtrace path by following backpointers to states back in time from backpointer
[qF ,E] [9].
5. PERFORMANCE OF THE SYSTEM IN NOISY ENVIRONMENTS
The throughput of speech identification process is often presented in terms of speed and accuracy
of the system. Accuracy may be calculated word error rate (WER) and speed is calculated in
terms of real time factor [3]. Single Word Error Rate (SWER) and Command Success Rate (CSR)
are other measures for accuracy calculation. In this section, we present experiments in order to
validate our approach. We use curvelet Coefficients as feature vectors and also a three state
HMM, two Gaussian mixtures. The speech recognition schemes are very expensive.
Consequently, using the HMM recognizer to a great extent reduce the cost of these systems. In
this HMM-based speech recognition system, five audio files such as BJW, JLS, JRM, LPN and
RM2 are modelled in HMM. [8] In recognition, additional number of states in HMM, it generate
good recognition rate or accuracy. The following tables 1, 2 and 3 present the percentages of
recognition rate for speech recognition.
Table 1: Percentage Accuracy for five states of HMM (N=5)
Dataset Total test correct rate Error rate Accuracy
BJW 85 64 21 75.29%
JLS 85 65 20 76.47%
JRM 85 62 23 72.94%
LPN 85 59 26 69.41%
RM2 85 63 22 74.11%
Table 2: Percentage Accuracy for seven states of HMM (N=7)
Dataset Total test correct rate Error rate Accuracy
BJW 85 68 21 80.00%
JLS 85 69 20 81.17%
JRM 85 66 23 77.64%
LPN 85 63 26 74.11%
RM2 85 67 22 78.82%
Table 3: Percentage Accuracy for ten states of HMM (N=10)
Dataset Total test correct rate Error rate Accuracy
BJW 85 72 21 84.70%
JLS 85 70 20 82.35%
JRM 85 69 23 81.17%
LPN 85 66 26 77.64%
RM2 85 71 22 83.52%
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
66
Figure 6. Comparison of Recognition Rates of Speech Recognition System for N=5, 7, 10
The past few decades, extensive research has been carried out on different possible developments
of automatic speech recognition (ASR) systems. The most important algorithms in the area of
ASR are the curvelet transform with the combination of hidden Markov models. Whatever may
be so many other methods also available such as wavelet-based transforms, artificial neural
networks with support vector machine classifier, which are becoming more popular, this paper
generate a comparative study based on diverse approaches that are proposed for the process of
ASR. The following table displays the performance comparison of dissimilar feature classifiers
for speech recognition process.
Table 4. Performance comparison of different feature classifier for speech recognition system
Classifier Precision Recall Specificity F-Measure
Naive Bayesian Approach 94.69% 88.50% 84.20% 0.92%
Support Vector Machine 93.33% 84.80% 84.04% 0.88%
Artificial Neural Network 93.02% 87.00% 86.04% 0.90%
Vector Quantization 92.02% 85.32% 83.01% 0.91%
Gaussian Mixture Model 91.23% 82.32% 84.12% 0.89%
Dynamic Time Wrapping 90.54% 81.54% 85.23% 0.93%
Template Based Approach
92.31% 85.41% 82.12% 0.89%
Pattern Recognition Model 94.52% 80.12% 86.32% 0.94%
Acoustic Model
91.45% 82.12% 84.12% 0.96%
Hidden Markov Method
98.63% 90.12% 83.14% 0.95%
11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
67
Figure 7. Performance comparison of dissimilar feature classifier for speech recognition process
The table given below shows the comparison of various classifiers
Table 5. Comparison of various classifiers
S. No Classifier Recognition Accuracy (%)
1 Naive Bayesian Approach 84.13
2 Support Vector Machine 88.25
3 Artificial Neural Network 90.23
4 Vector Quantization 90.14
5 Gaussian Mixture Model 90.32
6 Dynamic Time Wrapping 91.45
7 Template Based Approach 91.13
8 Pattern Recognition Model 93.25
9 Acoustic Model 92.35
10 Hidden Markov Method 97.72
6. CONCLUSION
Now a day’s biggest challenge is speech recognition on a daily basis, in which deformation and
noise in the environment are present and hold back recognition process. A speech recognition
system contains four stages: Analysis, Feature Extraction, and Modelling and Matching
techniques. In the field of speech recognition, lot of research done but still speech recognition
systems are not a hundred percent accurate, all with its own benefits and drawbacks. Whatever it
is accuracy for speech recognition still concentration for disparity of context, speaker’s
variability, and environment conditions. In this paper, we developed curvelet based Feature
Extraction method (CFE) for speech detection in noisy background and the input speech signal is
decomposed into dissimilar frequency channels using characteristics of curvelet transform.
Feature extraction is a vital stage in speech processing using curvelet transforms, the
computational complexity and the feature vector size are successfully reduced. They have better
accuracy, varying window size because of which they are suitable for non –stationary signals.
Thus curvelet transform is an elegant tool for the analysis of non-stationary signals like speech.
Hence various feature extraction methods can be used for dissimilar kinds of applications.
Though there are various common applications between these feature extraction techniques,
among all these techniques curvelet transform gives better system accuracy. Discrete hidden
12. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
68
markov model can be used for classification for better word recognition as they consider time
distribution of speech signals. The different approaches developing for speech recognition system
are compared with their merits and demerits. The HMM classification method achieved the
utmost accuracy in terms of identification rate for conversational with 80.1%, scientific phrases
with 86%, and control with 63.8 % recognition rates. The objective of this paper is to epitomize
the CFE method and classification stages in speech recognition system. The statistical results
shows that signal recognition accuracy will be increased by using discrete Curvelet transforms
over conventional methods.
REFERENCES
[1] A. Madan and Divya Gupta,” Speech Feature Extraction and Classification: A Comparative Review”
International Journal of Computer Applications (0975 – 8887) Volume 90 – No 9, March 2014.
[2] Nidhi Desai, Prof.Kinnal Dhameliya and Prof.Vijayendra Desai,” Feature Extraction and
Classification Techniques for Speech Recognition: A Review”, International Journal of Emerging
Technology and Advanced Engineering, ISSN 2250-2459, Volume 3, Issue 12, December 2013.
[3] Nidhi Srivastava and Dr.Harsh Dev“Speech Recognition using MFCC and Neural Networks”,
International Journal of Modern Engineering Research (IJMER), March 2007.
[4] Bishnu Prasad Das and Ranjan Parekh, “Recognition of Isolated Words using Features based on LPC,
MFCC, ZCR and STE, with Neural Network Classifiers”, International Journal of Modern
Engineering Research (IJMER) , Vol.2, Issue.3, May-June 2012.
[5] Sonia Sunny, David Peter S, K Poulose Jacob, “Design of a Novel Hybrid Algorithm for Improved
Speech Recognition with Support vector Machines Classifier”, International Journal of Emerging
Technology and Advanced Engineering, vol.3, pp.249-254, June 2013.
[6] M. C. Shrotriya, Omar Farooq, and Z. A. Abbasi, " Hybrid Wavelet based Lpc Features for Hindi
Speech Recognition ", International Journal of Information and Communication Technology, March
2008.
[7] subhani shaik and shaik subhani,”Detection and classification of power quality disturbances using
curvelet transform and support vector machine”, International Conference on Information
Communication and Embedded Systems (ICICES 2016). Pages: 1 - 8, DOI:
10.1109/ICICES.2016.7518870.
[8] Su Myat Mon, Hla Myo Tun," Speech-To-Text Conversion (STT) System Using Hidden Markov
Model (hmm)", international journal of scientific & technology research volume 4, issue 06, june
2015 issn 2277-8616.
[9] Speech and Language Processing. Daniel Jurafsky & James H. Martin, Draft of August 7
AUTHORS
Mr. N srinivasaro received his B.Tech degree from JNTU Hyderabad, received his
M.Tech degree from Acharya Nagarjuna University, Guntur, A.P. His area of interests
are Image processing, Computer Networks and Data Mining.
13. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 3, June 2018
69
Ms. Ch. Anuradha received her B.Tech degree from Acharya Nagarjuna University,
Guntur and received her M.Tech degree from JNTUK, Kakinada. She has 4 years of
Industry experience and 5 years of techning experinece. She published 8 papers in
National and Inter national journals. Her area of Interests are Image processing, Data
Mining, Network Security.
Dr. SVN Naga Srinivasu received his Phd from Acharya Nagarjuna University, Guntur,
Andhara Pradesh. Present he is working as a research supervisor at Nagarjuna University,
Guntur. He published more number of papers in National and Inter National Journals. His
area of Interests are Image Processing, Computer Networks, Software Engineering and
Data Mining.