Speech processing and consequent recognition are important areas of Digital Signal Processing
since speech allows people to communicate more natu-rally and efficiently. In this work, a
speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing
speech, features are to be ex-tracted from speech and hence feature extraction method plays an
important role in speech recognition. Here, front end processing for extracting the features is
per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and
Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose.
After classification using Naive Bayes classifier, DWT produced a recognition accuracy of
83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new
feature extraction method which produces improvements in the recognition accuracy. So, a new
method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes
the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated
and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Efficient Speech Emotion Recognition using SVM and Decision TreesIRJET Journal
This document discusses efficient speech emotion recognition using support vector machines and decision trees. It summarizes a research paper that extracted speech features like variance, standard deviation, energy and pitch from an emotional speech corpus containing 535 speech segments expressing seven emotions. The extracted features were used to train and test an SVM classifier for emotion recognition. The classifier achieved an average accuracy of 85% across training and test sets at recognizing the seven emotions. Feature selection techniques were used to address the curse of dimensionality caused by the large number of extracted features.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results and are suitable for large vocabulary, speaker-independent, continuous speech recognition.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
This document discusses speech recognition techniques. It begins by defining biometrics and how speech can be used as a biometric for identity authentication. It describes how speech recognition aims to extract lexical information independently of the speaker, while speaker recognition focuses on extracting the identity of the speaker. The document then discusses feature extraction using MFCC and modeling speech using neural networks. It provides an overview of pattern recognition techniques including statistical and structural approaches. Finally, it discusses implementation details such as preprocessing, framing, windowing and feature extraction of speech signals.
This document summarizes research on speaker recognition in noisy environments. It begins with an introduction discussing the goals of speaker identification and verification and their applications. It then provides details on the basic components of a speaker recognition system, including feature extraction and classification. The document focuses on methods for modeling noise, including generating multiple noisy training conditions and focusing matching on unaffected features. Experimental results are shown through snapshots of a prototype system interface that allows adding and recognizing speakers based on voice samples. The system is able to identify speakers in the presence of noise by comparing features to stored codebooks generated during training.
IRJET- Speech to Speech Translation SystemIRJET Journal
1. The document describes a speech-to-speech translation system that aims to facilitate conversations between people speaking different languages.
2. It discusses the architecture of the proposed system, which includes modules for speech input, speech recognition, translation, grammar correction, text-to-speech synthesis, and speech output.
3. The document also reviews related work on speech recognition, translation, and text-to-speech systems. It outlines the implementation status of the different modules in the proposed system and possibilities for future improvement, such as supporting additional languages.
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
This paper presents a Marathi database and isolated
Word recognition system based on Mel-frequency cepstral
coefficient (MFCC), and Distance Time Warping (DTW) as
features. For the extraction of the feature, Marathi speech
database has been designed by using the Computerized Speech
Lab. The database consists of the Marathi vowels, isolated
words starting with each vowels and simple Marathi sentences.
Each word has been repeated three times by the 35 speakers.
This paper presents the comparative recognition accuracy of
DTW and MFCC.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
Classification improvement of spoken arabic language based on radial basis fu...IJECEIAES
This document summarizes a research paper that aimed to improve classification of spoken Arabic language letters using the Radial Basis Function (RBF) neural network. The paper proposes a three-step approach: 1) preprocessing the speech signals which includes removing noise and segmenting the signals, 2) extracting statistical features from the preprocessed signals like zero-crossing rate and MFCCs, and 3) classifying the letters using an RBF neural network. The researchers tested different parameters and found classification accuracy improved from 90-99.375% compared to prior works. They concluded that combining statistical features with RBF neural networks provided over 1.845% better recognition rates than other methods for Arabic speech classification.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Development of text to speech system for yoruba languageAlexander Decker
This document describes the development of a text-to-speech (TTS) system for the Yoruba language. It begins with background on TTS systems and an overview of previous work developing TTS for other languages but not extensively for Yoruba. The authors then describe the architecture and design of the Yoruba TTS system they developed using a concatenative synthesis method. This includes analyzing the phonology and syllable structure of Yoruba, and developing components for syllable identification, prosody assignment, and speech signal processing. An evaluation of the system found 70% of respondents found it usable.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
The document provides an overview of techniques for secure speech communication, including speech coding, speaker identification, and encryption/decryption. It discusses various speech coding techniques like waveform coding, parametric coding, and hybrid coding that can compress speech signals while maintaining quality. It also describes speaker identification methods using hashing to authenticate users. For encryption, it outlines symmetric techniques like AES that use a shared key, and asymmetric techniques like RSA that use public/private key pairs. The goal is to integrate these methods to provide a high level of security for speech communication by removing redundancy, authenticating speakers, and strongly encrypting signals.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
This document summarizes a presentation on prescriber attitudes and education regarding prescription drug misuse. The presentation features speakers from the Substance Abuse and Mental Health Services Administration, Centers for Disease Control and Prevention, and Canadian Centre on Substance Abuse. It discusses perceptions of prescription drug misuse among healthcare professionals in Canada, including challenges in identifying misuse, inadequate training and resources to address the problem, and questionable prescribing practices encountered by pharmacists. The goal is to inform physicians and providers of education tools being developed by CDC/SAMHSA to help them play a critical role in responding to prescription drug abuse.
This document discusses a common misconception in mobile SEO. It explains that mobile search results are often just desktop search results rendered on a mobile device. This is because search engines like Google rely on existing links and signals from the desktop site to rank mobile sites. It recommends using responsive design so the same URLs serve all devices with different CSS, linking to the main site for ranking signals, and following onsite SEO best practices on the main site to rank well in mobile search.
Modelos de Negocio con Software Libre 1/6 GeneralidadesSergio Montoro Ten
Este documento presenta una introducción a los modelos de negocio basados en software libre. Explica que el software libre se refiere a las licencias y formas de distribución de software, no a un modelo de negocio en sí mismo. Luego discute factores que han llevado al surgimiento de nuevos modelos de negocio basados en software libre, como desequilibrios en la relación entre proveedores y clientes. Finalmente, analiza varios modelos de negocio populares y factores clave para evaluar la calidad de los modelos.
Kelly has over 12 years of experience in various professional roles including accounting, project management, human resources, and litigation for banking firms. She has a strong background in office software, business continuity planning, payroll processing, and handling leaves of absence. Her work history includes positions as a freight claims analyst, billing supervisor, accounts receivable analyst, operations unit manager, bad debt collector, litigation team manager, and litigation team analyst. Kelly has an Associate's Degree and certifications in Six Sigma Greenbelt and as a Competent Toastmaster.
Jagdamba Prasad Dobhal is a mathematics teacher at a small school in Uttarakhand who has enabled ICT-based learning despite limited resources. He had students create a fire line in a forest to contain fires, and study landslides to help the community. Through projects and Intel Teach training, he introduced computers and the internet. Enrollment increased from 240 to 294 students, and pass rates rose from below 30% to over 80%. The school now excels in co-curricular activities. Dobhal's efforts transformed the school using technology and has won him awards.
Efficient Speech Emotion Recognition using SVM and Decision TreesIRJET Journal
This document discusses efficient speech emotion recognition using support vector machines and decision trees. It summarizes a research paper that extracted speech features like variance, standard deviation, energy and pitch from an emotional speech corpus containing 535 speech segments expressing seven emotions. The extracted features were used to train and test an SVM classifier for emotion recognition. The classifier achieved an average accuracy of 85% across training and test sets at recognizing the seven emotions. Feature selection techniques were used to address the curse of dimensionality caused by the large number of extracted features.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results and are suitable for large vocabulary, speaker-independent, continuous speech recognition.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
This document discusses speech recognition techniques. It begins by defining biometrics and how speech can be used as a biometric for identity authentication. It describes how speech recognition aims to extract lexical information independently of the speaker, while speaker recognition focuses on extracting the identity of the speaker. The document then discusses feature extraction using MFCC and modeling speech using neural networks. It provides an overview of pattern recognition techniques including statistical and structural approaches. Finally, it discusses implementation details such as preprocessing, framing, windowing and feature extraction of speech signals.
This document summarizes research on speaker recognition in noisy environments. It begins with an introduction discussing the goals of speaker identification and verification and their applications. It then provides details on the basic components of a speaker recognition system, including feature extraction and classification. The document focuses on methods for modeling noise, including generating multiple noisy training conditions and focusing matching on unaffected features. Experimental results are shown through snapshots of a prototype system interface that allows adding and recognizing speakers based on voice samples. The system is able to identify speakers in the presence of noise by comparing features to stored codebooks generated during training.
IRJET- Speech to Speech Translation SystemIRJET Journal
1. The document describes a speech-to-speech translation system that aims to facilitate conversations between people speaking different languages.
2. It discusses the architecture of the proposed system, which includes modules for speech input, speech recognition, translation, grammar correction, text-to-speech synthesis, and speech output.
3. The document also reviews related work on speech recognition, translation, and text-to-speech systems. It outlines the implementation status of the different modules in the proposed system and possibilities for future improvement, such as supporting additional languages.
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
This paper presents a Marathi database and isolated
Word recognition system based on Mel-frequency cepstral
coefficient (MFCC), and Distance Time Warping (DTW) as
features. For the extraction of the feature, Marathi speech
database has been designed by using the Computerized Speech
Lab. The database consists of the Marathi vowels, isolated
words starting with each vowels and simple Marathi sentences.
Each word has been repeated three times by the 35 speakers.
This paper presents the comparative recognition accuracy of
DTW and MFCC.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
Classification improvement of spoken arabic language based on radial basis fu...IJECEIAES
This document summarizes a research paper that aimed to improve classification of spoken Arabic language letters using the Radial Basis Function (RBF) neural network. The paper proposes a three-step approach: 1) preprocessing the speech signals which includes removing noise and segmenting the signals, 2) extracting statistical features from the preprocessed signals like zero-crossing rate and MFCCs, and 3) classifying the letters using an RBF neural network. The researchers tested different parameters and found classification accuracy improved from 90-99.375% compared to prior works. They concluded that combining statistical features with RBF neural networks provided over 1.845% better recognition rates than other methods for Arabic speech classification.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Development of text to speech system for yoruba languageAlexander Decker
This document describes the development of a text-to-speech (TTS) system for the Yoruba language. It begins with background on TTS systems and an overview of previous work developing TTS for other languages but not extensively for Yoruba. The authors then describe the architecture and design of the Yoruba TTS system they developed using a concatenative synthesis method. This includes analyzing the phonology and syllable structure of Yoruba, and developing components for syllable identification, prosody assignment, and speech signal processing. An evaluation of the system found 70% of respondents found it usable.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
The document provides an overview of techniques for secure speech communication, including speech coding, speaker identification, and encryption/decryption. It discusses various speech coding techniques like waveform coding, parametric coding, and hybrid coding that can compress speech signals while maintaining quality. It also describes speaker identification methods using hashing to authenticate users. For encryption, it outlines symmetric techniques like AES that use a shared key, and asymmetric techniques like RSA that use public/private key pairs. The goal is to integrate these methods to provide a high level of security for speech communication by removing redundancy, authenticating speakers, and strongly encrypting signals.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
This document summarizes a presentation on prescriber attitudes and education regarding prescription drug misuse. The presentation features speakers from the Substance Abuse and Mental Health Services Administration, Centers for Disease Control and Prevention, and Canadian Centre on Substance Abuse. It discusses perceptions of prescription drug misuse among healthcare professionals in Canada, including challenges in identifying misuse, inadequate training and resources to address the problem, and questionable prescribing practices encountered by pharmacists. The goal is to inform physicians and providers of education tools being developed by CDC/SAMHSA to help them play a critical role in responding to prescription drug abuse.
This document discusses a common misconception in mobile SEO. It explains that mobile search results are often just desktop search results rendered on a mobile device. This is because search engines like Google rely on existing links and signals from the desktop site to rank mobile sites. It recommends using responsive design so the same URLs serve all devices with different CSS, linking to the main site for ranking signals, and following onsite SEO best practices on the main site to rank well in mobile search.
Modelos de Negocio con Software Libre 1/6 GeneralidadesSergio Montoro Ten
Este documento presenta una introducción a los modelos de negocio basados en software libre. Explica que el software libre se refiere a las licencias y formas de distribución de software, no a un modelo de negocio en sí mismo. Luego discute factores que han llevado al surgimiento de nuevos modelos de negocio basados en software libre, como desequilibrios en la relación entre proveedores y clientes. Finalmente, analiza varios modelos de negocio populares y factores clave para evaluar la calidad de los modelos.
Kelly has over 12 years of experience in various professional roles including accounting, project management, human resources, and litigation for banking firms. She has a strong background in office software, business continuity planning, payroll processing, and handling leaves of absence. Her work history includes positions as a freight claims analyst, billing supervisor, accounts receivable analyst, operations unit manager, bad debt collector, litigation team manager, and litigation team analyst. Kelly has an Associate's Degree and certifications in Six Sigma Greenbelt and as a Competent Toastmaster.
Jagdamba Prasad Dobhal is a mathematics teacher at a small school in Uttarakhand who has enabled ICT-based learning despite limited resources. He had students create a fire line in a forest to contain fires, and study landslides to help the community. Through projects and Intel Teach training, he introduced computers and the internet. Enrollment increased from 240 to 294 students, and pass rates rose from below 30% to over 80%. The school now excels in co-curricular activities. Dobhal's efforts transformed the school using technology and has won him awards.
An access control model of virtual machine securitycsandit
Virtualization technology becomes a hot IT technology with the popu-larity of Cloud
Computing. However, new security issues arise with it. Specifically, the resources sharing and
data communication in virtual machines are most concerned. In this paper an access control
model is proposed which combines the Chinese Wall and BLP model. BLP multi-level security
model is introduced with corresponding improvement based on PCW (Prioritized Chinese Wall)
security model. This model can be used to safely control the resources and event behaviors in
virtual machines. Experimental results show its effectiveness and safety.
An approach for software effort estimation using fuzzy numbers and genetic al...csandit
One of the most critical tasks during the software development life cycle is that of estimating the
effort and time involved in the development of the software product. Estimation may be
performed by many ways such as: Expert judgments, Algorithmic effort estimation, Machine
learning and Analogy-based estimation. In which Analogy-based software effort estimation is
the process of identifying one or more historical projects that are similar to the project being
developed and then using the estimates from them. Analogy-based estimation is integrated with
Fuzzy numbers in order to improve the performance of software project effort estimation during
the early stages of a software development lifecycle. Because of uncertainty associated with
attribute measurement and data availability, fuzzy logic is introduced in the proposed model.
But hardly a historical project is exactly same as the project being estimated due to some
distance associated in similarity distance. This means that the most similar project still has a
similarity distance with the project being estimated in most of the cases. Therefore, the effort
needs to be adjusted when the most similar project has a similarity distance with the project
being estimated. To adjust the reused effort, we build an adjustment mechanism whose
algorithm can derive the optimal adjustment on the reused effort using Genetic Algorithm. The
proposed model Combine the fuzzy logic to estimate software effort in early stages with Genetic
algorithm based adjustment mechanism may result to near the correct effort estimation.
Ringkasan dokumen tersebut adalah:
Dokumen tersebut membahas tentang krisis lingkungan hidup dan perubahan iklim di Indonesia yang disebabkan oleh tiga faktor utama yaitu krisis lingkungan hidup dan perubahan iklim, absennya peran negara, dan kebijakan negara serta praktek buruk. Untuk mengatasi masalah ini diperlukan kesadaran politik lingkungan dari masyarakat dan agenda lingkungan hidup dalam pilpres 2014.
Grupo Reifs: La depresión en las personas mayoresgruporeifs
La depresión es común entre las personas mayores y puede causar infelicidad, cansancio y afectar el sueño y el apetito. La depresión no debe confundirse con momentos pasajeros de tristeza y es importante reconocerla para buscar tratamiento. Aunque la depresión puede durar años, el tratamiento farmacológico y el apoyo de la familia suelen ayudar a controlarla. No debe ignorarse la depresión en las personas mayores para evitar que empeore su sufrimiento.
Auto mobile vehicle direction in road traffic using artificial neural networkscsandit
This document presents a study on using an artificial neural network to analyze automobile vehicle routing in traffic conditions. Specifically, it aims to select the optimal route given inputs like distance, traffic volume, number of signals, road conditions and travel time. A backpropagation neural network with 5 input nodes, 10 hidden nodes and 5 output nodes representing 5 potential routes is created. The network is trained on 180 combinations of the 5 input parameters to learn the best route. Testing shows the backpropagation network achieves 91.2% accuracy in route selection, outperforming a network without backpropagation. The study concludes the network provides an effective approach for dynamic vehicle routing analysis, though some assumptions could be improved.
The document summarizes the key events in Europe that helped establish and promote CLIL (Content and Language Integrated Learning) from 1958 onwards. It outlines the establishment of official languages in the European Economic Community in 1958 and several resolutions from 1976-1995 that encouraged teaching subjects in foreign languages and improving language education. It also mentions the creation of organizations like the European Centre for Modern Languages in 1998 and frameworks like the Common European Framework of Reference for Languages in 2001 that further supported CLIL.
En la Siguiente presentación podran observar ¿qué es? y ¿para qué sirve?, la Nube -Cloud computing, sus principales ventajas para nosotros los usuarios y algunos servidores que utilizan la NUBE.
Performance of different classifiers in speech recognitioneSAT Journals
Abstract Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods. Index Terms: Speech Recognition, Soft Thresholding, Discrete Wavelet Transforms, Artificial Neural Networks, Support Vector Machines and Naive Bayes Classifier.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
Robust Speech Recognition Technique using Mat labIRJET Journal
The document proposes a new robust speech recognition technique using MATLAB. It takes an audio signal as input, eliminates noise, and matches patterns to recognize the input. It uses Fourier transforms to extract data from the audio, performs noise elimination, converts the noise-eliminated data to patterns, and matches the patterns to those stored in a database. Patterns are represented as states in a weighted automaton. The technique achieves 90% accuracy in recognizing words and is well-suited for voice command systems.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
This document reviews techniques for speech-based emotion detection using deep learning. It discusses how deep learning techniques have been proposed as an alternative to traditional methods for speech emotion recognition. Feature extraction is an important part of speech emotion recognition, and deep learning can help minimize the complexity of extracted features. The document surveys related work on speech emotion recognition using techniques like deep neural networks, convolutional neural networks, recurrent neural networks, and more. It examines the limitations of current approaches and the potential for deep learning to improve speech-based emotion detection.
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
The document compares the performance of two speech synthesis methods - unit selection and hidden Markov model (HMM) - for Hindi language. It finds that unit selection results in higher quality synthesized speech than HMM based on both subjective and objective quality measurements. Subjective measurements using mean opinion scores show unit selection receives higher average ratings. Objective measurements of mean square error and peak signal-to-noise ratio also indicate unit selection introduces less distortion compared to the original speech samples.
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...CSEIJJournal
Speech recognition has been an active research topic for more than 50 years. Interacting with the
computer through speech is one of the active scientific research fields particularly for the disable
community who face variety of difficulties to use the computer. Such research in Automatic Speech
Recognition (ASR) is investigated for different languages because each language has its specific features.
Especially the need for ASR system in Tamil language has been increased widely in the last few years. In
this paper, a speech recognition system for individually spoken word in Tamil language using multilayer
feed forward network is presented. To implement the above system, initially the input signal is
preprocessed using four types of filters namely preemphasis, median, average and Butterworth bandstop
filter in order to remove the background noise and to enhance the signal. The performance of these filters
are measured based on MSE and PSNR values. The best filtered signal is taken as the input for the further
process of ASR system
The project was started with a sole aim in mind that the design should be able to recognize the voice of a person by analyzing the speech signal. The simulation is done in MATLAB. The design of the project is based on using the Linear prediction filter coefficient (LPC) and Principal component analysis (PCA) on data (princomp) for the speech signal analysis. The Sample Collection process is accomplished by using the microphone to record the speech of male/female. After executing the program the speech is analyzed by the analysis part of our MATLAB program code and our design should be able to identify and give the judgment that the recorded speech signal is same as that of our desired output.
This document provides a course report on speech recognition using MATLAB. It includes an abstract, introduction, report outline, design of the system, process and results, flowchart of the experiment, simulation and analysis, discussion, and conclusion sections. The report details using MFCC and GMM algorithms for speech recognition, including signal preprocessing, parameter extraction, and training. Simulation results showed recognition rates approaching 100% for same speakers in quiet environments, but performance was impacted by noise and different speakers. The system demonstrated speech recognition capabilities in MATLAB but had limitations.
This document describes a speaker independent Malayalam word recognition system using support vector machines. It extracts mel frequency cepstral coefficients (MFCCs) as features and compares the performance of linear and radial basis function kernels in the SVM classifier. An overall accuracy of 91.8% was achieved on a database containing isolated word utterances from different speakers, demonstrating the effectiveness of the MFCC features and RBF kernel for speaker independent speech recognition.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An optimized approach to voice translation on mobile phoneseSAT Journals
Abstract Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones. Keywords:-Optimized voice translation, mobile client, speech recognition, language interpretation, template matching, probability search, session- based cache.
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
This document discusses a proposed Recurrent-Convolutional Encoder-Decoder (R-CED) network for speech enhancement. The R-CED network aims to overcome challenges with existing methods by estimating the a priori and posteriori signal-to-noise ratios to separate noise from speech. The R-CED consists of convolutional layers with increasing and decreasing numbers of filters to encode and decode features. Performance will be evaluated using metrics like PESQ, STOI, CER, MSE, SNR, and SDR. The proposed method aims to improve speech enhancement accuracy and recover enhanced speech quality compared to other techniques.
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
This document presents research on developing an automatic lip reading system using convolutional neural networks. The system takes in video frames of a speaker's face without audio and classifies the words or phrases being spoken. The researchers preprocessed the data by detecting faces in video frames and cropping them. They then trained a CNN model on concatenated frames. Their model achieved 80.44% accuracy on the test set in classifying 10 words and 10 phrases from 17 speakers. The researchers concluded the model could be improved by addressing overfitting to unseen speakers with a larger dataset and regularization techniques.
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
This document presents a robust sign language and hand gesture recognition system using convolutional neural networks. The system captures image frames and processes them through various neural network layers to classify hand gestures into letters, numbers, or other symbols. It segments the hand from images using color thresholds and edge detection. The images then undergo preprocessing like resizing before being classified by the CNN. The CNN is trained on a large dataset to accurately recognize gestures in sign language and provide text output to bridge communication between deaf and non-signing individuals. The system achieved good results classifying several alphabet letters but could be expanded to recognize word combinations.
Similar to Combined feature extraction techniques and naive bayes classifier for speech recognition (20)
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
2. 156 Computer Science & Information Technology (CS & IT)
areas like Signal Processing, Natural Language Processing, Statistics etc [3]. Speech signals are
non stationary in nature. There are many factors that make recognition of speech a complex task.
This is due to the fact that when people speak, there is difference in gender, emotional state,
accent, pronunciation, articulation, nasality, pitch, volume and speed variability [4]. Background
noise, additive noise and different types of disturbances may also affect the performance of a
speech recognition system.
Usually the performance of a speech recognition system is measured in terms of recognition
accuracy. There has been a lot of research in the area of speech recognition in different languages
like English, Chinese, Arabic, Bengali, Tamil, Hindi etc. But only few works have been reported
in Malayalam. When compared to speaker dependent speech recognition system which involves a
single speaker, developing an efficient speaker independent system is a difficult task since it
involves the recognition of the speech patterns from a group of people. As speech recognition
involves different stages like pre-processing, feature extraction and pattern classification, the
performance of the overall speech recognition system depends on the techniques selected for
these stages. Speech signals are usually affected by different additive as well as background
noise. So, pre-processing has to be done to remove the noise from the signals. Here, pre-
processing is done using wavelet denoising method based on soft thresholding [5]. Feature
extraction is done using two wavelet based feature extraction methods namely DWT and WPD
due to the good time and frequency resolution properties of the wavelets. The performance of
both these methods in classifying the digits is evaluated using Naive Bayes classifier and a new
hybrid algorithm is developed by combining the features obtained from both the methods.
The paper is organized as follows. Section 2 gives a brief description of the problem definition
and the methodology used. The digits database is explained in section 3. The method used for
preprocessing is illustrated in section 4. Section 5 elaborates the various feature extraction
techniques used in this work followed by the classification technique in section 6. A detailed
evaluation of the experiments and the results obtained are explained in section 7. Conclusion is
given in the last section.
2. STATEMENT OF THE PROBLEM AND ARCHITECTURE OF THE
SYSTEM
ASR enables people to communicate more naturally and effectively without the use of an
interface in between. In this work, a speech recognition system is designed for recognizing
speaker independent spoken digits in Malayalam, since research in Malayalam is in its infancy
stage. The speech signals captured are to be converted to a set of parameters. Feature extraction is
the front end processing and most important part of speech recognition since it plays an important
role in separating one speech from other. So developing new techniques for feature extraction has
been an important area of research for many years. Most of the speech-based studies are based on
spectral analysis of speech signals using Mel-Frequency Cepstral Coefficients (MFCCs), Linear
predictive Coding (LPCs) etc . Results from most of the works reveal that the dimensions and
computational complexity of the feature vectors obtained are higher in these methods. In wavelet
transforms, the size of the feature vector is less compared to other methods. This reduces the
computational complexity. Now, more and more studies are being done on wavelets.
The main objective of this work is to design a more efficient speech recognition system which
performs better than the existing methods. Since wavelets are found to be good in speech
3. Computer Science & Information Technology (CS & IT) 157
recognition, here two wavelet based feature extraction methods namely DWT and WPD are used
and the performance of both these in recognizing the digits database created are compared and
analyzed. Naive Bayes classifiers are used in order to obtain the classification performance of the
features. Both the methods are found to be good in recognizing speech. But since feature
extraction is the front end processing part of a speech recognition system, designing of a new
algorithm with improved feature vectors is a challenging research problem. So a new feature
extraction method is developed by utilizing the features obtained from both WPD and DWT to
improve the recognition rate.
In this work, first the digits database is created since there is no standard database available in
Malayalam. The captured speech signals are then pre-processed using wavelet denoising
techniques to tune the signals for extracting features by removing the noise from it. After
denoising, the signals are presented to feature extraction techniques namely DWT, WPD and the
proposed DWPD. The extracted features are then given for pattern classification using Naive
Bayes classifier. Finally these techniques are compared and the performance of these techniques
is evaluated based on recognition accuracy.
3. DIGITS DATABASE
A new database is created for spoken digits in Malayalam language using 200 speakers of age
between 6 and 70 uttering 10 Malayalam digits. 200 speakers including 80 male speakers, 80
female speakers and 40 children were entrusted with the task of recording the speech samples.
Male and female speech differ in pitch, frequency, phonetics and many other factors due to the
difference in physiological as well as psychological factors. A high quality studio-recording
microphone at a sampling rate of 8 KHz (4 KHz band limited) is used for recording the speech
samples. Ten Malayalam digits from 0 to 9 are used to create the database under the same
configuration. Thus the database consists of a total of 2000 utterances of the digits. The spoken
digits are numbered and stored in the appropriate classes in the database. The spoken digits and
their International Phonetic Alphabet (IPA) format are shown in Table 1.
Table 1. Numbers Stored in the Database and their IPA Format
Number digit Words in Malayalam IPA format
English Translation
0 /pu:dƷıəṁ/ Zero
1 /onnə/ One
2 /r˄ndə/ Two
3 /mu:nnə/ Three
4 /na:lə/ Four
5 /˄ndƷe/ Five
6 /aːrə/ Six
7 /eiƷə/ Seven
8 /ettə/ Eight
9 /onp˄θə/ Nine
4. 158 Computer Science & Information Technology (CS & IT)
4. PRE-PROCESSING USING WAVELET DENOISING
Speech signals are often degraded by the presence of additive or convolution noise present in the
background. So, in order to remove the background noise, these signals are to be denoised so that
the noise present in it are suppressed during pre-processing stage before extracting the features.
Different techniques are available for speech enhancement. In this work, we have used wavelet
denoising algorithms for reducing the noise present in the signal. Wavelet denoising uses
thresholding functions which are used to limit the values of the elements. The most popular
thresholding functions that are widely used in wavelet denoising method are the hard and the soft
thresholding functions [6]. Hard thresholding sets to zero any element whose absolute value is
lower than the threshold. In soft thresholding, the absolute values of the elements that are lower
than the threshold are first set to zero. Then it shrinks the nonzero coefficients towards 0. Hard
and soft thresholding can be defined as
Where X denotes the wavelet coefficients and ι represents the threshold value. Here we have used
soft thresholding technique. Threshold value can be obtained using different methods. The
universal threshold derived by Donoho and Johnstone [7] for the white Gaussian noise under a
mean square error criterion is used in this work which is represented as
( )2log Nτ σ= (3)
where σ denotes the standard deviation and N denotes the length of the signal.
The algorithm used for denoising mainly consists of 3 steps.
• Apply wavelet transform up to 8 levels to the noisy signal to produce the noisy wavelet
coefficients.
• Select an appropriate threshold limit. Apply soft thresholding to the detail wavelet coefficients.
• Apply inverse discrete wavelet transform to the wavelet coefficients that are obtained from the
previous step. This produces the denoised signal.
5. FEATURE EXTRACTION TECHNIQUES USED
Every speech signal has different individual characteristics embedded in it and these characteristics can be
extracted using a wide range of feature extraction techniques. A brief description of the feature extraction
techniques used in this work namely DWT, WPD and the new proposed hybrid algorithm DWPD is
explained below.
5.1 Discrete Wavelet Transforms
DWT is a more recent, popular and computationally efficient technique which is used in different
areas like signal processing, image processing etc. One of the main characteristics of the wavelet
| |
0 | | (1){X if X
Hard if XX τ
τ
>
≤=
( ) (| | | |) | |
0 | | (2){
sign X X if X
Soft if XX
τ τ
τ
− >
≤=
5. Computer Science & Information Technology (CS & IT) 159
transforms is that it uses windows of different sizes. It adopts broad windows at low frequencies
and narrow windows at high frequencies, thus leading to an optimal time–frequency resolution in
all frequency ranges [8]. DWT and WPD use digital filtering techniques to obtain a time-scale
representation of the signals. DWT is defined by the following equation.
Where Ψ (t) is the basic analyzing function called the mother wavelet. In DWT, the one
dimensional speech signal passes through two discrete-time low and high pass quadrature mirror
filters which produces the corresponding wavelet coefficients. It produces two signals, called
approximation coefficients and detail coefficients [9]. In speech signals, low frequency
components h[n] are of greater importance than high frequency components g[n] as the low
frequency components characterize a signal more than its high frequency components [10]. The
successive high pass and low pass filtering of the signal is given by
Where Yhigh (detail coefficients) and Ylow (approximation coefficients) are the outputs of the
filters obtained by sub sampling by 2. According to Mallat algorithm, the filtering process is
continued until the desired level is reached [11].
5.2 Wavelet Packet Decomposition
DWT performs a one sided dyadic tree decomposition of signals. But, WPD allows any dyadic
tree structure analysis where the approximation as well as detail coefficients are decomposed
iteratively up to a certain level chosen [12]. Thus WPD is a more flexible and detailed method
than DWT and it also produces good time and frequency resolutions. In WPD, more detailed
time-scale analysis can be achieved. The main characteristic of WPD is that it uses long-time
interval for low-frequency information and short-time interval for high-frequency information
[I3].
5.3 Proposed Hybrid algorithm
Wavelets are a powerful and extremely useful method for speech recognition. A wavelet
transform decomposes a signal into sub-bands with low frequency components which contain the
characteristics of a signal and high frequency components which are related with noise and
disturbance in a signal [14]. The features of the signals can be retained if the high frequency
contents are removed. This reduces the noise in the signal [15]. But sometimes the high frequency
components may contain useful features of the signal. The main drawback of DWT is that it
cannot decompose the high frequency band into more partitions. Although WPD can achieve this
decomposition, it is also applied to low frequency band signals, which mainly includes the
desired signals. So this causes unnecessary computational complexity. To overcome the
limitations of DWT and WPD, we propose a new algorithm for speech enhancement by
6. 160 Computer Science & Information Technology (CS & IT)
combining the features of both DWT and WPD. The outline of the proposed DWPD algorithm is
given below.
• The speech signal is decomposed into a low frequency band signal and a high frequency band
signal. That is one level of decomposition is performed.
• 7 scales of DWT are then applied on the low frequency components and 7 scales of WPD are
applied on the high frequency components.
• The features obtained from both decomposition are combined together to form the new feature
vector set.
The wavelet decomposition trees of DWT, WPD and DWPD up to 3 levels are shown in figure 1.
Fig. 1. DWT decomposition WPD decomposition DWPD decomposition
The proposed new hybrid algorithm DWPD has the following advantages.
• Decomposes high frequency band into more partitions.
• Saves computational complexities.
• Improves recognition rate.
6. CLASSIFICATION USING NAIVE BAYES CLASSIFIER
Since speech recognition is a multiclass classification problem, Naive Bayes classifiers are used
because it can handle multiclass classification problems. Naive Bayes classifier is based on the
Bayesian theory which is a simple and efficient probability classification method based on
supervised classification technique. For each class value it estimates that a given instance belongs
to that class [16]. The feature items in one class are assumed to be independent of other attribute
values called class conditional independence [17]. Naive Bayes classifier needs only small
amount of training set to estimate the parameters for classification. The classifier is stated as
P(A|B) = P (B|A) * P (A)/P(B) (7)
Where P(A) is the prior probability or marginal probability of A, P(A|B) is the conditional
probability of A, given B called the posterior probability, P(B|A) is the conditional probability of
B given A and P(B) is the prior or marginal probability of B which acts as a normalizing constant.
This provides the mathematical representation of the way in which the conditional probability of
event A given B can be related to the conditional probability of B given A. The probability value
of the winning class dominates over that of the others [18].
7. Computer Science & Information Technology (CS & IT) 161
7. EXPERIMENTS AND PERFORMANCE EVALUATION
In this work, we have used wavelets for feature extraction. There are different wavelets available
for signal analysis. So the first step is to select the suitable wavelet family and hence the wavelet.
In this work, the Daubechies wavelets are used which are found to be efficient in signal
processing applications. Among the Daubechies family of wavelets, the db4 type of mother
wavelet is used for feature extraction since it gives better results [19]. The speech samples after
pre-processing are successively decomposed into approximation and detailed coefficients.
Another consideration is the level up to which decomposition is to be performed. For DWT, less
frequency components from level 8 is used to create the feature vectors. In WPD, both the high
frequency and low frequency components are decomposed up to level 8. The number of features
obtained using DWT is 16 and using WPD is 20. In the proposed DWPD, the features from both
DWT and WPD are combined. Classification task using Naive Bayes classifier involves
separating data into training and test sets. In all the three methods, we have divided the database
into three. 70% of the data for training, 15% for validation and 15% for testing.
In this work, better results are obtained at level 8 during decomposition. The original signal and
the 8th
level decomposition coefficients of spoken word Poojyam (Zero) using DWT is given in
figure 2.
Fig. 2. Decomposition of digit poojyam at 8th
level using DWT
The original signal and the 8th level decomposition coefficients of spoken digit Poojyam (zero)
using WPD is given in figure 3.
Fig. 3. Decomposition of digit poojyam at 8th
level using WPD
0 0.5 1 1.5 2 2.5 3
x 10
4
-1
-0.5
0
0.5
1
Original Signal
0 10 20 30 40 50 60
-1.5
-1
-0.5
0
0.5
1
Word -zero Approximation coefficients Level 8
0 10 20 30 40 50 60
-1.5
-1
-0.5
0
0.5
1
1.5
Detail coefficients Level 8
0 0.5 1 1.5 2 2.5 3
x 10
4
-1
-0.5
0
0.5
1
Original Signal -zero
0 2 4 6 8 10 12 14
-0.2
-0.1
0
0.1
0.2
0.3
coefficient values using wpd -zero
8. 162 Computer Science & Information Technology (CS & IT)
All these methods produced good recognition accuracies after classification. The overall
recognition accuracies obtained using these methods are shown in table 2.
Table 2. Comparison of results
8. CONCLUSION
In this work, we have exploited the powerful features of wavelets in developing a novel speech
recognition system for speaker independent digits recognition in Malayalam. Here, two major
wavelet based feature extraction techniques such as DWT and WPD are used for feature
extraction and the performance of these methods are evaluated. Since speech recognition is a
pattern recognition problem, Naive Bayes classifiers are used for classification. Both the methods
produced an accuracy of 83.5% and 80.7% respectively. The ultimate aim of a speech recognition
system is to attain maximum recognition rate. Though both the methods produced good results, a
more enhanced method called DWPD is developed by combining the features of DWT and WPD.
An accuracy of 86.2% is obtained using this hybrid structure. Soft thesholding technique is
applied to the signal for reducing the noise before feature extraction. The experimental results
show that this hybrid architecture using DWT and WPD along with Naive Bayes classifier could
effectively extract the features from the speech signal. As an extension of this study, the
performance of this proposed method can be analyzed using other classifiers like Support Vector
Machines, Artificial Neural Networks etc.
REFERENCES
[1] Y. Ajami Alotaibi (2005) Investigating Spoken Arabic Digits in Speech Recognition Setting.
Information Sciences 173:115-139
[2] C. Kurian, K. Balakrishnan (2009) Speech Recognition of Malayalam Numbers. World Con-gress on
Nature and Biologically Inspired Computing 1475-1479
[3] B. Gold, N. Morgan (2002) Speech and Audio Signal Processing. John Wiley and Sons, New York
[4] recognition.http://www.learnartificialneuralnetworks.com/speechrecognition.html
[5] Danie Rasetshwane, J. Robert Boston, Ching-Chung Li (2006) Identification of Speech Transients
Using Variable Frame Rate Analysis and Wavelet Packets. Proc.of the 28th IEEE EMBS Annual
International Conference 1:1727-1730
[6] Yasser Ghanbari, Mohammad Reza Karami (2006) A new Approach for Speech Enhancement based
on the Adaptive Thresholding of the Wavelet Packets. Speech Communication 48:927–940
[7] D.L. Donoho (1995) De-noising by Soft Thresholding. IEEE transactions on Information Theory
41:613-627
[8] Elif Derya Ubeyil (2009) Combined Neural Network model Employing Wavelet Coefficients for
ECG Signals Classification. Digital Signal Processing 19:297-308
[9] S. Chan Woo, C.Peng Lin, R. Osman (2001) Development of a Speaker Recognition System using
Wavelets and Artificial Neural Networks. Proc. of Int. Symposium on Intelligent Multimedia, Video
and Speech Processing 413-416
[10] S. Kadambe, P. Srinivasan (1994) Application of Adaptive Wavelets for Speech. Optical Engineering
33:2204-2211
Feature Extraction Method Recognition Accuracy (%)
DWT 83.50
WPD 80.70
DWPD 86.20
9. Computer Science & Information Technology (CS & IT) 163
[11] S .G. Mallat (1989) A Theory for Multiresolution Signal Decomposition: The Wavelet Re-
presentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11:674-693
[12] R.R.Coifman, M.V.Wickerhauser (1992) Entropy based Algorithm for best Basis Selection. IEEE
Transactions on Information Theory 38:713-718
[13] Wu Ting, Yan Guo-zheng, Yang Bang-hua et al (2008) EEG Feature Extraction based on Wavelet
Packet Decomposition for Brain Computer Interface. Measurement 41:618-625
[14] B. C. Li, J. S. Luo (2003) Wavelet Analysis and Its Applications. Electronics Engineering Press,
Beijing, China
[15] Zhen-li Wang, Jie Yang, Xiong-wei Zhang (2006) Combined Discrete Wavelet Transform and
Wavelet Packet Decomposition for Speech Enhancement. Proc. of 8th International Conference on
Signal Processing
[16] Li Dan, Liu Lihua, Zhang Zhaoxin (2013) Research of Text Categorization on WEKA. Proc. of Third
International Conference on Intelligent System Design and Engineering Applications 1129-1131
[17] J. Han, M. Kamber (2000) Data Mining Concepts and Techniques. Morgan Kauffman Publishers
[18] Laszlo Toth, Andras Kocsor, Janos Csirik (2005) On Naive Bayes In Speech Recognition. Int. J.
Appl. Math. Comput. Sci 15:287–294
[19] Sonia Sunny, David Peter S, K Poulose Jacob (2013) Performance Analysis of Different Wavelet
Families in Recognizing Speech. International Journal of Engineering Trends and Technology 4:512-
517