This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
The document discusses speech processing using MATLAB. It involves analyzing speech sounds through components like normalizers and segmenters, converting the sounds to a code, and then synthesizing speech by reproducing recorded elements. There are many applications ranging from limited to full communication systems. Speech signals are usually processed digitally as a form of digital signal processing. Aspects include acquiring, manipulating, storing, transferring and outputting speech signals.
This document discusses speech recognition techniques. It begins by defining biometrics and how speech can be used as a biometric for identity authentication. It describes how speech recognition aims to extract lexical information independently of the speaker, while speaker recognition focuses on extracting the identity of the speaker. The document then discusses feature extraction using MFCC and modeling speech using neural networks. It provides an overview of pattern recognition techniques including statistical and structural approaches. Finally, it discusses implementation details such as preprocessing, framing, windowing and feature extraction of speech signals.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results and are suitable for large vocabulary, speaker-independent, continuous speech recognition.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
1) The document describes the development of a speaker-dependent speech recognition system using MATLAB. It uses Gaussian mixture models for acoustic modeling and mel-frequency cepstral coefficients for feature extraction.
2) The system is designed to recognize isolated digits 0-9. Voice activity detection is performed to detect segments of speech. Various windowing functions are evaluated to reduce spectral leakage during feature extraction.
3) 13 MFCCs plus energy are extracted from each 30ms frame with 10ms shift. The first and second derivatives are also calculated to capture dynamic information, resulting in a 39-dimensional feature vector.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
The document discusses speech processing using MATLAB. It involves analyzing speech sounds through components like normalizers and segmenters, converting the sounds to a code, and then synthesizing speech by reproducing recorded elements. There are many applications ranging from limited to full communication systems. Speech signals are usually processed digitally as a form of digital signal processing. Aspects include acquiring, manipulating, storing, transferring and outputting speech signals.
This document discusses speech recognition techniques. It begins by defining biometrics and how speech can be used as a biometric for identity authentication. It describes how speech recognition aims to extract lexical information independently of the speaker, while speaker recognition focuses on extracting the identity of the speaker. The document then discusses feature extraction using MFCC and modeling speech using neural networks. It provides an overview of pattern recognition techniques including statistical and structural approaches. Finally, it discusses implementation details such as preprocessing, framing, windowing and feature extraction of speech signals.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results and are suitable for large vocabulary, speaker-independent, continuous speech recognition.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
1) The document describes the development of a speaker-dependent speech recognition system using MATLAB. It uses Gaussian mixture models for acoustic modeling and mel-frequency cepstral coefficients for feature extraction.
2) The system is designed to recognize isolated digits 0-9. Voice activity detection is performed to detect segments of speech. Various windowing functions are evaluated to reduce spectral leakage during feature extraction.
3) 13 MFCCs plus energy are extracted from each 30ms frame with 10ms shift. The first and second derivatives are also calculated to capture dynamic information, resulting in a 39-dimensional feature vector.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
This document summarizes research on speaker recognition in noisy environments. It begins with an introduction discussing the goals of speaker identification and verification and their applications. It then provides details on the basic components of a speaker recognition system, including feature extraction and classification. The document focuses on methods for modeling noise, including generating multiple noisy training conditions and focusing matching on unaffected features. Experimental results are shown through snapshots of a prototype system interface that allows adding and recognizing speakers based on voice samples. The system is able to identify speakers in the presence of noise by comparing features to stored codebooks generated during training.
Voice recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
This document describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
This document analyzes speech coding algorithms for Hindi and English languages. It discusses Linear Predictive Coding (LPC), an algorithm that accurately estimates speech parameters and represents speech signals at reduced bit rates while preserving quality. The paper proposes a voice-excited LPC algorithm and implements it on Hindi and English male and female voices. It analyzes tradeoffs between bit rates, delay, signal-to-noise ratio, and complexity. The results show low bit-rates and better signal-to-noise ratio with this algorithm.
The following resources come from the 2009/10 B.Sc in Media Technology and Digital Broadcast (course number 2ELE0076) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
This document provides an overview of automatic speech recognition systems. It begins with an introduction that defines automatic speech recognition as the real-time transcription of spoken language into text. It then includes a block diagram showing the main components, and describes the goal of accurately converting speech signals to text independently of speaker or device. Applications discussed include smart phones, artificial intelligence systems, home automation, and computers. The document also covers related technologies, benefits like hands-free use, and concludes that this technology is beneficial for both public and private sectors.
Voice Identification And Recognition System, MatlabSohaib Tallat
A simple yet complex approach to modern sophistication.
Made this project using the MFCC approach and then embedding the code to a Graphical User Interface. In the end made a standalone application for the program using deployment tools of matlab
COLEA : A MATLAB Tool for Speech AnalysisRushin Shah
This document describes COLEA, a MATLAB tool for speech analysis. COLEA allows users to analyze speech signals in both the time and frequency domains. It provides various speech analysis features including waveform display, spectrogram display, formant analysis, and comparison of waveforms using distance measures like SNR, cepstrum, weighted cepstrum, Itakura-Saito, and weighted likelihood ratio. The document explains how to install and use COLEA's various analysis functions, such as linear predictive coding analysis, FFT analysis, and segmentation and filtering of speech signals.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
Automatic speech recognition system using deep learningAnkan Dutta
This document describes the development of an automatic speech recognition system using deep learning techniques. It discusses extracting MFCC features from audio signals and using a convolutional neural network for feature extraction, followed by a Gaussian mixture model-hidden Markov model for recognition. It also describes implementing a speech recognition system using the Kaldi toolkit on a digits dataset consisting of 10 speakers, as well as an automatic speaker recognition system using MFCC features and K-nearest neighbors classification. The speech recognition system achieved an accuracy of 72% and the speaker recognition system achieved 80% accuracy on the digits dataset.
This document outlines a proposed study on developing a voice password-based speaker verification system. It will explore methods for modeling speakers with limited data, such as artificially generating multiple utterances from short speech segments. The study aims to reduce phonetic variability between training and test data. It will also examine score normalization techniques, comparing cohort-centric normalization typically used to a proposed speaker-centric approach. The goal is to build a text-independent voice password system that can reliably verify identities from short speech samples, improving security and enabling remote access applications.
This document summarizes a speech recognition system (SRS). SRS uses speech identification and verification. Speech identification determines which registered speaker provided an utterance by extracting features like mel-frequency cepstrum coefficients and comparing them. Speech verification accepts or rejects an identity claim by clustering training vectors from an enrollment session into speaker-specific codebooks using vector quantization. Applications of SRS include banking by phone, voice dialing, voice mail, and security control.
This document provides an introduction to building modern websites using HTML5 and CSS3. It discusses several new features in HTML5, including semantic elements, the <canvas> element for 2D drawing, <audio> and <video> elements for multimedia, local storage for offline applications, and other new elements and APIs. The tutorial assumes an intermediate level of experience with HTML, CSS, and JavaScript and provides code examples to demonstrate how to implement these new features.
Para ser bom professor é preciso, sim, ter vocaçãoFlavio Farah
Qualquer pessoa que pretenda exercer o magistério deve antes avaliar se possui: 1) interesse efetivo na docência e 2) um conjunto de aptidões específicas necessárias a esse ofício. Se possuir ambos os requisitos, o candidato poderá adquirir as competências (conhecimentos e habilidades) necessárias ao magistério matriculando-se em um bom curso de formação profissional.
DOSSIER: EL ESTRE y los TRASTORNOS EMOCIONALES, trastorno de ansiedad, trastorno obsesivo-compulsivo, trastorno somatomorfo, trastorno disociados, Estados de animo y trastorno psicosomático
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
This document summarizes research on speaker recognition in noisy environments. It begins with an introduction discussing the goals of speaker identification and verification and their applications. It then provides details on the basic components of a speaker recognition system, including feature extraction and classification. The document focuses on methods for modeling noise, including generating multiple noisy training conditions and focusing matching on unaffected features. Experimental results are shown through snapshots of a prototype system interface that allows adding and recognizing speakers based on voice samples. The system is able to identify speakers in the presence of noise by comparing features to stored codebooks generated during training.
Voice recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
This document describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
This document analyzes speech coding algorithms for Hindi and English languages. It discusses Linear Predictive Coding (LPC), an algorithm that accurately estimates speech parameters and represents speech signals at reduced bit rates while preserving quality. The paper proposes a voice-excited LPC algorithm and implements it on Hindi and English male and female voices. It analyzes tradeoffs between bit rates, delay, signal-to-noise ratio, and complexity. The results show low bit-rates and better signal-to-noise ratio with this algorithm.
The following resources come from the 2009/10 B.Sc in Media Technology and Digital Broadcast (course number 2ELE0076) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
This document provides an overview of automatic speech recognition systems. It begins with an introduction that defines automatic speech recognition as the real-time transcription of spoken language into text. It then includes a block diagram showing the main components, and describes the goal of accurately converting speech signals to text independently of speaker or device. Applications discussed include smart phones, artificial intelligence systems, home automation, and computers. The document also covers related technologies, benefits like hands-free use, and concludes that this technology is beneficial for both public and private sectors.
Voice Identification And Recognition System, MatlabSohaib Tallat
A simple yet complex approach to modern sophistication.
Made this project using the MFCC approach and then embedding the code to a Graphical User Interface. In the end made a standalone application for the program using deployment tools of matlab
COLEA : A MATLAB Tool for Speech AnalysisRushin Shah
This document describes COLEA, a MATLAB tool for speech analysis. COLEA allows users to analyze speech signals in both the time and frequency domains. It provides various speech analysis features including waveform display, spectrogram display, formant analysis, and comparison of waveforms using distance measures like SNR, cepstrum, weighted cepstrum, Itakura-Saito, and weighted likelihood ratio. The document explains how to install and use COLEA's various analysis functions, such as linear predictive coding analysis, FFT analysis, and segmentation and filtering of speech signals.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
Automatic speech recognition system using deep learningAnkan Dutta
This document describes the development of an automatic speech recognition system using deep learning techniques. It discusses extracting MFCC features from audio signals and using a convolutional neural network for feature extraction, followed by a Gaussian mixture model-hidden Markov model for recognition. It also describes implementing a speech recognition system using the Kaldi toolkit on a digits dataset consisting of 10 speakers, as well as an automatic speaker recognition system using MFCC features and K-nearest neighbors classification. The speech recognition system achieved an accuracy of 72% and the speaker recognition system achieved 80% accuracy on the digits dataset.
This document outlines a proposed study on developing a voice password-based speaker verification system. It will explore methods for modeling speakers with limited data, such as artificially generating multiple utterances from short speech segments. The study aims to reduce phonetic variability between training and test data. It will also examine score normalization techniques, comparing cohort-centric normalization typically used to a proposed speaker-centric approach. The goal is to build a text-independent voice password system that can reliably verify identities from short speech samples, improving security and enabling remote access applications.
This document summarizes a speech recognition system (SRS). SRS uses speech identification and verification. Speech identification determines which registered speaker provided an utterance by extracting features like mel-frequency cepstrum coefficients and comparing them. Speech verification accepts or rejects an identity claim by clustering training vectors from an enrollment session into speaker-specific codebooks using vector quantization. Applications of SRS include banking by phone, voice dialing, voice mail, and security control.
This document provides an introduction to building modern websites using HTML5 and CSS3. It discusses several new features in HTML5, including semantic elements, the <canvas> element for 2D drawing, <audio> and <video> elements for multimedia, local storage for offline applications, and other new elements and APIs. The tutorial assumes an intermediate level of experience with HTML, CSS, and JavaScript and provides code examples to demonstrate how to implement these new features.
Para ser bom professor é preciso, sim, ter vocaçãoFlavio Farah
Qualquer pessoa que pretenda exercer o magistério deve antes avaliar se possui: 1) interesse efetivo na docência e 2) um conjunto de aptidões específicas necessárias a esse ofício. Se possuir ambos os requisitos, o candidato poderá adquirir as competências (conhecimentos e habilidades) necessárias ao magistério matriculando-se em um bom curso de formação profissional.
DOSSIER: EL ESTRE y los TRASTORNOS EMOCIONALES, trastorno de ansiedad, trastorno obsesivo-compulsivo, trastorno somatomorfo, trastorno disociados, Estados de animo y trastorno psicosomático
O risco de votar em candidatos populistasFlavio Farah
O documento discute os riscos de votar em candidatos populistas, definindo três modalidades de populismo - político, econômico e administrativo. Ele descreve as características do populismo político e econômico, e como esses estilos de governo podem levar a resultados negativos. O texto também analisa alguns presidentes brasileiros do passado e identifica aqueles que exibiram traços populistas.
Diabetic retinopathy also known as diabetic eye disease, is when damage occurs to the retina
due to diabetes. It can eventually lead to blindness. By analyzing and detecting vasculature structures
in retinal image the diabetes can be detected in advanced stages by comparing its states of retinal
blood vessels. In blood vessel classification approach computer based retinal image analysis can be
used to extract the retinal image vessels. Stationary wavelet transform (SWT) are used to extract the
features from the fundus image and classification can be performed using Support Vector
Machine(SVM). SVM has become an essential machine learning method for the detection and
classification of particular patterns in medical images. It is used in a wide range of applications for its
ability to detect patterns in experimental databases. If the vessels are present, then it is extracted by
using segmentation. Mathematical morphology and K-means clustering is used to segment the vessels.
To enhance the blood vessels and suppress the background information, smoothing operation can be
performed on the retinal image using mathematical morphology. Then the enhanced image is
segmented using K-means clustering algorithm to detect the diseases easily.
GMAW shielding gases can be argon, a mix of argon and carbon dioxide, or carbon dioxide. Argon produces low spatter but also low weld penetration and power consumption with high arc stability. Carbon dioxide results in high spatter but the highest weld penetration and power consumption with low arc stability. A mix of 80% argon and 20% carbon dioxide provides a balance between the qualities of the pure gases.
IBM Cloud Orchestrator: Adding Persistence Storage to Cloud InstancesPaulraj Pappaiah
IBM Cloud Orchestrator allows users to add persistent storage to cloud instances. The process involves searching the self-service catalog for storage volumes, creating a new volume by selecting a region, availability zone, size and other parameters, and then submitting the request. Users can then verify the new volume, attach it to a server, and format and mount the volume. Finally, the volume can be seen as an attached storage resource from within the VMware vSphere client.
The document discusses various forms and conventions used in media products like music videos. It analyzes the student's music video and compares elements to other well-known music videos. Key elements discussed include camera techniques like tracking shots and handheld shots, editing techniques like jump cuts, sets, lighting, costumes, and branding elements like the logo, website, and album packaging design. Many conventions from indie/rock genre music videos influenced the student's media project.
This document provides information about cockroaches and disease-carrying true bugs. It discusses the biology and life cycles of common cockroach species and their medical importance, describing how they can transmit pathogens and cause allergies. Details are given on identifying features and control methods for major bug types, including bed bugs and kissing bugs that transmit Chagas disease. The kissing bugs section focuses on the Triatoma genus, their role as vectors of Trypanosoma cruzi, and epidemiology of Chagas disease in the Americas.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
This document analyzes speech recognition performance based on neural network layers and transfer functions. It describes a system setup that uses MFCC features and ANNs for acoustic modeling. Experiments were conducted by varying the number of neural network layers (1, 2, 3 layers) and transfer functions (linear, sigmoid, tangent sigmoid). Results showed that networks with linear transfer functions in all layers achieved the performance goal, while other configurations reached minimum gradient but not the goal. The best architecture was a 3-layer network with linear transfer functions.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This document describes a speech recognized automation system using speaker identification through wireless communication. The system uses a speech processor and MATLAB coding with MFCC algorithms to perform speech recognition and speaker identification. It then wirelessly controls electrical devices based on speech commands. Testing showed 80-85% accuracy for the actual speaker and lower (10-20%) accuracy for other speakers. Future work could involve improving speaker recognition accuracy as the number of speakers increases.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
This document provides a course report on speech recognition using MATLAB. It includes an abstract, introduction, report outline, design of the system, process and results, flowchart of the experiment, simulation and analysis, discussion, and conclusion sections. The report details using MFCC and GMM algorithms for speech recognition, including signal preprocessing, parameter extraction, and training. Simulation results showed recognition rates approaching 100% for same speakers in quiet environments, but performance was impacted by noise and different speakers. The system demonstrated speech recognition capabilities in MATLAB but had limitations.
This document presents a text-dependent speaker recognition system using neural networks that aims to improve recognition accuracy. It proposes changing the number of Mel Frequency Cepstral Coefficients (MFCCs) used in training. Voice Activity Detection is also used as a preprocessing step. Experimental results show recognition accuracy increases from 70.41% to a maximum of 92.91% as the number of MFCCs increases from 14 to 20, but then decreases with more MFCCs. The system is implemented on a Raspberry Pi for hardware acceleration.
This document summarizes a research paper on developing a speech-to-text conversion system for visually impaired people using μ-law companding. The system uses MATLAB to analyze input speech signals, extract features, filter noise, and match signals to samples stored in a database to convert speech to text. A graphical user interface was created to input speech and display recognition results. The system achieved real-time speech recognition and conversion to text with high accuracy using μ-law companding techniques for signal processing and correlation comparisons to the stored samples.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
This document discusses a method for speaker identification and verification using Mel Frequency Cepstral Coefficients (MFCC) and Support Vector Machines (SVM). MFCC is used to extract features from speech samples, representing the spectral properties of the voice. SVM is then used to match these features between speech samples to identify or verify speakers. The authors claim this method can identify speakers with up to 95% accuracy and is computationally efficient. It is proposed that MFCC and SVM outperform other techniques for speaker recognition tasks. The system is described as having applications in security, voice interfaces, and other voice-based systems.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system provided accurate voice recognition under various conditions.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results for large vocabulary, speaker-independent, continuous speech recognition.
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data.
Similar to [IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P (20)
These days we have an increased number of heart diseases including increased risk of heart attacks. Our proposed system users sensors that allow to detect heart rate of a person using heartbeat sensing even if the person is at home. The sensor is then interfaced to a microcontroller that allows checking heart rate readings and transmitting them over internet. The user may set the high as well as low levels of heart beat limit. After setting these limits, the system starts monitoring and as soon as patient heart beat goes above a certain limit, the system sends an alert to the controller which then transmits this over the internet and alerts the doctors as well as concerned users. Also the system alerts for lower heartbeats. Whenever the user logs on for monitoring, the system also displays the live heart rate of the patient. Thus concerned ones may monitor heart rate as well get an alert of heart attack to the patient immediately from anywhere and the person can be saved on time.This value will continue to grow if no proper solution is found. Internet of Things (IoT) technology developments allows humans to control a variety of high-tech equipment in our daily lives. One of these is the ease of checking health using gadgets, either a phone, tablet or laptop. we mainly focused on the safety measures for both driver and vehicle by using three types of sensors: Heartbeat sensor, Traffic light sensor and Level sensor. Heartbeat sensor is used to monitor heartbeat rate of the driver constantly and prevents from the accidents by controlling through IOT.
ABSTRACT The success of the cloud computing paradigm is due to its on-demand, self-service, and pay-by-use nature. Public key encryption with keyword search applies only to the certain circumstances that keyword cipher text can only be retrieved by a specific user and only supports single-keyword matching. In the existing searchable encryption schemes, either the communication mode is one-to-one, or only single-keyword search is supported. This paper proposes a searchable encryption that is based on attributes and supports multi-keyword search. Searchable encryption is a primitive, which not only protects data privacy of data owners but also enables data users to search over the encrypted data. Most existing searchable encryption schemes are in the single-user setting. There are only few schemes in the multiple data users setting, i.e., encrypted data sharing. Among these schemes, most of the early techniques depend on a trusted third party with interactive search protocols or need cumbersome key management. To remedy the defects, the most recent approaches borrow ideas from attribute-based encryption to enable attribute-based keyword search (ABKS
This document reviews the behavior of reinforced concrete deep beams. Deep beams are defined as having a shear span to depth ratio of less than 5. The response of deep beams differs from regular beams due to the influence of shear deformations and stresses. Failure modes include flexure, flexural-shear, and diagonal cracking. Previous studies investigated factors affecting shear strength such as concrete strength, reinforcement, and loading conditions. Equations have been proposed to predict shear strength based on test results.
Subcutaneous administration of toluene to rabbits for 6 weeks resulted in significant increases in liver enzyme levels and histopathological changes in the liver tissue. Liver sections from toluene-treated rabbits showed congested central veins, flattening and vacuolation of hepatocytes, and disarrangement of hepatic architecture. In contrast, liver sections from control rabbits appeared normal. Toluene exposure is known to cause oxidative stress and damage cell membranes in the liver through its metabolism.
This document summarizes a research paper that proposes a system to analyze crop phenology (growth stages) using IoT to support parallel agriculture management. The system would use sensors to collect data on soil moisture, temperature, humidity and other parameters. This data would be input to a database. Then, a multiple linear regression model trained on past data would predict the optimal crop and expected yield based on the tested sensor data and parameters. This system aims to help farmers select crops and fertilization practices tailored to their specific fields' conditions.
This document summarizes a study that determined the liberation size of gold ore from the Iperindo-Ilesha deposit in Nigeria and assessed its amenability to froth flotation. Samples of the ore were collected and subjected to sieve analysis to determine particle size fractions. Chemical analysis found that the actual and economic liberation sizes were 45μm and 250μm, respectively. Froth flotation experiments at 45μm particle size and varying collector dosages achieved a maximum gold recovery of 78.93% at 0.3 mol/dm3 collector dosage, with concentrate grade of 115 ppm Au. These parameters will be used for further processing to extract gold from this deposit.
This document presents a proposal for an IOT-based intelligent baby care system with a web application for remote baby monitoring. The system uses sensors to automatically swing a cradle when a baby cries, sound alarms if the baby cries for too long or the mattress is wet, and sends alerts to a web page for parents to monitor the baby's status from anywhere via internet connection. The proposed system aims to help working parents manage childcare remotely using sensors, a Raspberry Pi, web camera, and cloud server to detect the baby's activities and notify parents through a web application on their phone.
This document discusses various sources of water pollution and new techniques being developed for water purification. It begins by outlining how water pollution occurs from industrial wastes like mining and manufacturing, agricultural runoff containing pesticides, and domestic waste. It then examines some specific pollutants in more depth from these sources. New techniques under research for water purification are also mentioned, with the goal of developing more affordable methods. The document aims to analyze the impact of pollutants on water and introduce promising new purification techniques.
This document summarizes a research paper on using big data methodologies with IoT and its applications. It discusses how big data analytics is being used across various fields like engineering, data management, and more. It also discusses how IoT enables the collection of massive amounts of data from sensors and devices. Machine learning techniques are used to analyze this big data from IoT and enable communication between devices. The document provides examples of domains where big data and IoT are being applied, such as healthcare, energy, transportation, and others. It analyzes the similarities and differences in how big data techniques are used across these IoT domains.
The document describes a proposed smart library automation and monitoring system using RFID technology. The system uses RFID tags attached to books and student ID cards. An RFID scanner reads the tags to automate processes like tracking student entry and exit, book check-in/check-out, and inventory management. This allows transactions to occur without manual intervention. The system also includes an Android app for students to search books and check availability. The goals are to streamline library operations, prevent unauthorized access, and help locate misplaced books. Raspberry Pi hardware and a MySQL database are part of the proposed implementation.
This document discusses congestion control techniques for vehicular ad hoc networks (VANETs). It first provides background on VANETs, noting their use of vehicle-to-vehicle communication to share information. Congestion can occur when there is a sudden increase in data from nodes in the network. The document then reviews different existing congestion control schemes, which vary in how they adjust source sending rates and handle transient congestion. It proposes a priority-based congestion control technique using dual queues, one for transit packets and one for locally generated packets. This approach aims to route packets along less congested paths when congestion is detected based on buffer occupancy.
This document summarizes a research paper that proposes applying principles of Vedic mathematics to optimize the design of multipliers, squarers, and cubers. It begins by providing background on multipliers and their importance in electronic systems. It then reviews related work applying Vedic mathematics to multiplier design. The document outlines the methodology for performing multiplication, squaring, and cubing according to Vedic mathematics principles. It presents simulation and synthesis results comparing the proposed Vedic designs to traditional array-based designs, finding improvements in speed, power, and area. The document concludes that Vedic mathematics provides an effective approach for optimizing the design of these fundamental arithmetic components.
Cloud computing is the one of the emerging techniques to process the big data. Large collection of set or large
volume of data is known as big data. Processing of big data (MRI images and DICOM images) normally takes
more time compare with other data. The main tasks such as handling big data can be solved by using the concepts
of hadoop. Enhancing the hadoop concept it will help the user to process the large set of images or data. The
Advanced Hadoop Distributed File System (AHDF) and MapReduce are the two default main functions which
are used to enhance hadoop. HDF method is a hadoop file storing system, which is used for storing and retrieving
the data. MapReduce is the combinations of two functions namely maps and reduce. Map is the process of
splitting the inputs and reduce is the process of integrating the output of map’s input. Recently, in medical fields
the experienced problems like machine failure and fault tolerance while processing the result for the scanned
data. A unique optimized time scheduling algorithm, called Advanced Dynamic Handover Reduce Function
(ADHRF) algorithm is introduced in the reduce function. Enhancement of hadoop and cloud introduction of
ADHRF helps to overcome the processing risks, to get optimized result with less waiting time and reduction in
error percentage of the output image
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
Colorectal cancer (CRC) has potential to spread within the peritoneal cavity, and this transcoelomic
dissemination is termed “peritoneal metastases” (PM).The aim of this article was to summarise the current
evidence regarding CRC patients at high risk of PM. Colorectal cancer is the second most common cause of cancer
death in the UK. Prompt investigation of suspicious symptoms is important, but there is increasing evidence that
screening for the disease can produce significant reductions in mortality.High quality surgery is of paramount
importance in achieving good outcomes, particularly in rectal cancer, but adjuvant radiotherapy and chemotherapy
have important parts to play. The treatment of advanced disease is still essentially palliative, although surgery for
limited hepatic metastases may be curative in a small proportion of patients.
This document summarizes a research paper on the thermal performance of air conditioners using nanofluids compared to base fluids. Key points:
- Nanofluids, which are liquids containing nanoparticles, can improve heat transfer in heat pipes and cooling systems due to their higher thermal conductivity compared to base fluids.
- The document reviews how factors like nanofluid type, nanoparticle size and concentration affect thermal efficiency and heat transfer limits. It also examines using nanofluids to enhance heat exchange in transmission fluids.
- An experimental setup is described to study heat transfer and friction factors of water-based Al2O3 nanofluids in a horizontal tube under constant heat flux. Temperature, pressure and flow rate are measured
Now-a-day’s pedal powered grinding machine is used only for grinding purpose. Also, it requires lots of efforts
and limited for single application use. Another problem in existing model is that it consumed more time and also has
lower efficiency. Our aim is to design a human powered grinding machine which can also be used for many purposes
like pumping, grinding, washing, cutting, etc. it can carry water to a height 8 meter and produces 4 ampere of electricity
in most effective way. The system is also useful for the health conscious work out purpose. The purpose of this technical
study is to increase the performance and output capacity of pedal powered grinding machine.
This document summarizes a research paper that proposes using distributed control of multiple energy storage units (ESUs) to manage voltage and loading in electric distribution networks with renewable energy sources like solar and wind. The distributed control approach coordinates the ESUs to store excess power generated during peak periods and discharge it during peak load periods. Each ESU can provide both active and reactive power to support voltage and manage power flows. The distributed control strategy uses a consensus algorithm to divide the required active power reduction equally among ESUs based on their available capacity. Simulation results are presented to analyze the coordinated control of ESU active and reactive power outputs over time.
The steady increase in non-linear loads on the power supply network such as, AC variable speed drives,
DC variable Speed drives, UPS, Inverter and SMPS raises issues about power quality and reliability. In this
subject, attention has been focused on harmonics . Harmonics overload the power system network and cause
reliability problems on equipment and system and also waste energy. Passive and active harmonic filters are
used to mitigate harmonic problems. The use of both active and passive filter is justified to mitigate the
harmonics. The difficulty for practicing engineers is to select and deploy correct harmonic filters , This paper
explains which solutions are suitable when it comes to choosing active and passive harmonic filters and also
explains the mistakes need to be avoided.
This Paper is aimed at analyzing the few important Power System equipment failures generally
occurring in the Industrial Power Distribution system. Many such general problems if not resolved it may
lead to huge production stoppage and unforeseen equipment damages. We can improve the reliability of
Power system by simply applying the problem solving tool for every case study and finding out the root cause
of the problem, validation of root cause and elimination by corrective measures. This problem solving
approach to be practiced by every day to improve the power system reliability. This paper will throw the light
and will be a guide for the Practicing Electrical Engineers to find out the solution for every problem which
they come across in their day to day maintenance activity.
More from IJET - International Journal of Engineering and Techniques (20)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
1. International Journal of Engineering and Techniques
ISSN: 2395-1303 http://www.ijetjournal.org
A Comparative Study on Feature Extraction Technique for
Isolated Word Speech Recognition
Easwari
1,2
(PG & Research Department of Computer Science
I. INTRODUCTION
Digital Speech Signal Processing is the
process of converting one type of speech signal
representation to another type of representation so
as to uncover various mathematical or practical
properties of the speech signal and do appropriate
processing to support in solving both fundamental
and deep troubles of interest. Digital Speech
Processing chain has two different main model
They are Speech Production Model/Generation
Model which deals with acoustic waveform and
Speech Perception Model/Recognition Model deals
with spectral representation for recognition process.
Digital Speech Processing used to achieve
reliability, flexibility, accuracy, real
implementations on low-cost digital speech
processing chip, facility to integrate with
multimedia and data, encryptability/security of the
data and the data representations via suitable
techniques. The overall process of production and
recognition of speech is to convert the speech signal
from the device or human, and to understand the
message is speech chain. In other word, the process
of converting the speech signals into acoustic
Abstract:
One of the common and easier techniques of feature extraction is Mel Frequency
Cestrum Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by
Dynamic Feature Extraction and provide high performance rate when compared to p
technique like LPC. But one of the major drawbacks in this technique is robustness.
extraction technique is Relative Spectral
feature coefficient and in both the log spectral and the
distortions as an additive constant. The high
the convolution noise introduced in the channel. The low
frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces
the impact of the noise in signals and provide
Keywords — Automatic Speech Recognition, MFCC, RASTA,
Signal Noise Ratio.
RESEARCH ARTICLE
International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov –
http://www.ijetjournal.org
A Comparative Study on Feature Extraction Technique for
Isolated Word Speech Recognition
Easwari.N1
, Ponmuthuramalingam.P2
PG & Research Department of Computer Science, Government Arts College, and Coimbatore
Digital Speech Signal Processing is the
converting one type of speech signal
representation to another type of representation so
as to uncover various mathematical or practical
properties of the speech signal and do appropriate
processing to support in solving both fundamental
and deep troubles of interest. Digital Speech
g chain has two different main models.
They are Speech Production Model/Generation
Model which deals with acoustic waveform and
Speech Perception Model/Recognition Model deals
with spectral representation for recognition process.
used to achieve
reliability, flexibility, accuracy, real-time
digital speech
, facility to integrate with
multimedia and data, encryptability/security of the
data and the data representations via suitable
The overall process of production and
recognition of speech is to convert the speech signal
from the device or human, and to understand the
message is speech chain. In other word, the process
of converting the speech signals into acoustic
waveform is speech processing. Speech Production
is the process of converting the text from the
messenger to acoustic waveform. Speech
Perception is the process of analyzing the acoustic
waveform into the understandable message.
Fig. 1 Cyclic Representation of Spee
The speech chain comprises of speech
production, auditory feedback to the speaker,
One of the common and easier techniques of feature extraction is Mel Frequency
Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by
Dynamic Feature Extraction and provide high performance rate when compared to p
technique like LPC. But one of the major drawbacks in this technique is robustness.
Spectral (RASTA). In effect the RASTA filter band passes each
feature coefficient and in both the log spectral and the Spectral domains appear linear channel
distortions as an additive constant. The high-pass portions of the equivalent band pass filter effect
ion noise introduced in the channel. The low-pass filtering helps in smoothing frame to
frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces
the impact of the noise in signals and provides high robustness.
Automatic Speech Recognition, MFCC, RASTA, Isolated Word, Speech Chain
– Dec 2015
Page 108
A Comparative Study on Feature Extraction Technique for
Coimbatore)
peech processing. Speech Production
is the process of converting the text from the
messenger to acoustic waveform. Speech
Perception is the process of analyzing the acoustic
waveform into the understandable message.
Cyclic Representation of Speech Chain
The speech chain comprises of speech
production, auditory feedback to the speaker,
One of the common and easier techniques of feature extraction is Mel Frequency
Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by
Dynamic Feature Extraction and provide high performance rate when compared to previous
technique like LPC. But one of the major drawbacks in this technique is robustness. Another feature
. In effect the RASTA filter band passes each
domains appear linear channel
pass portions of the equivalent band pass filter effect
pass filtering helps in smoothing frame to
frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces
Isolated Word, Speech Chain,
OPEN ACCESS
2. International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov – Dec 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 109
speech transmission and speech perception and
understanding by the listener. The message from a
speaker to a listener in speech chain is represented
in different levels. The Speech Chain which
produce continuous and discrete output in the
Information rate of 30 kbps to 50 bps from the
discrete input and continuous input of the signals.
In practical applications, using of real world speech
will leads to an issue of noise and distortions [1].
Automatic speech recognition is a method
by which a computer maps an acoustic speech
signals into text. Automatic speech recognition
processed under two phases. They are Training
phase, where the speech signals are recorded,
represented as parameters and stored in database
and the next one is Recognition phase where the
features from speech signals are extracted and
referred to the templates which are already existed.
Speech recognition is also known as automatic
speech recognition or computer speech recognition
which means understanding voice of the computer
and performing any required task or the ability to
match a voice against a provided or acquired
vocabulary. speech recognition system consists of a
microphone, for the person to speak into; speech
recognition software; a computer to take and
interpret the speech; a good quality soundcard for
input and/or output; a proper and good
pronunciation [1].An efficient Speech Recognition
system has the major considerations for developing
higher recognition accuracy, achieving low word
error rate and addressing the issues of variability in
the source. Speech Recognition Methodologies
have four different stages. Each stage of
methodologies deals with various analyses of
speech signals, extracting algorithms, identification
of signal and related word matching.
II. FEATURE EXTRACTION
Feature Extraction is the process of
removing unwanted and redundant information and
find out the set of properties called as parameter of
utterances by processing of the signal waveform of
the utterances. These parameters are the features.
After preprocessing the feature extraction is
performed. It produces a meaningful representation
of speech signal. The most important part of the
speech recognition system which distinguishes one
speech from another. First of all, recording of
various speech samples of each word of the
vocabulary is done by different speakers. After the
speech samples are collected; they are converted
from analog to digital form by sampling at a
frequency of 16 kHz. Sampling means recording
the speech signals at a regular interval. The
collected data is now quantized if required to
eliminate noise in speech samples. The collected
speech samples are then passed through the feature
extraction, feature training & feature testing stages.
Feature extraction transforms the incoming sound
into an internal representation such that it is
possible to reconstruct the original signal from it.
There are various techniques to extract features like
MFCC (Mel Frequency Cepstral Coefficient),
PLP (Perceptual Linear Prediction),
RASTA(RelAtive SpecTrAl Filtering),
LPC (Linear Predictive Coding), PCA (Principal
Component Analysis), LDA (Linear Discriminant
Analysis), ICA (Independent Component Analysis),
Wavelet etc but mostly used is MFCC [2].
A. Process Of Feature Extraction
The steps for feature extraction are based on
the speech signals. After filtering the cepstral
coefficient value will be the features. The most
common feature extraction technique was MFCC
which provides more accurate than the others.
• Speech signals are sampled and quantized.
• Pre-emphasis - The speech samples are sent
through a high-pass filter to amplify the
frequencies above 1 KHz in the spectrum
because hearing is more perceptive in this
region.
• Frame blocking: The speech samples are
blocked into frames of N samples
(amounting to a time period of 10-30 ms)
with an overlap of some samples between
frames.
• Fast Fourier Transform (FFT): FFT is
performed on each of the frames to obtain
the magnitude values of the frequency
response.
3. International Journal of Engineering and Techniques
ISSN: 2395-1303 http://www.ijetjournal.org
• Triangular Band-pass Filtering: The
magnitude frequency response is multiplied
by a set of triangular band-pass filter
the log energy value of each filter. The
positions of these filters are evenly spaced
along the Mel frequency scale.
• Discrete cosine transform or DCT: The
DCT is applied on the logarithm of the
energy obtained from the triangular band
pass filters [3].
In speech recognition, feature extraction
requires much attention only because of main
recognize process depends heavily on this phase.
Among the different techniques
extraction the common technique MFCC and the
other one RASTA are discussed briefly
B. Mel Frequency Cepstral Coefficents
Mel Frequency Cepstral Coefficents
(MFCCs) are a feature widely used in automatic
speech and speaker recognition. They were
introduced by Davis and Mermelstein in the 1980's,
and have been state-of-the-art ever since. Prior to
the introduction of MFCCs, Linear Prediction
Coefficients (LPCs) and Linear Prediction Cepstral
Coefficients (LPCCs) and were the main feature
type for automatic speech recognition (ASR).
extraction of the best parametric represent
acoustic signals is an important task to produce a
better recognition performance.
The efficiency of this phase is important for
the next phase since it affects its behavior. MFCC is
based on human hearing perceptions which extract
can not perceive frequencies over 1 Khz. In other
words, in MFCC is based on known variation of the
human ears critical bandwidth with frequency.
MFCC has two types of filter which are spaced
linearly at low frequency below 1000 Hz and
logarithmic spacing above 1000Hz.
pitch is present on Mel Frequency Scale to capture
important characteristic of phonetic in speech.
MFCC consists of seven computational steps. Each
step has its function and mathematical approaches
as discussed briefly in the following:
International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov –
http://www.ijetjournal.org
pass Filtering: The
magnitude frequency response is multiplied
pass filters to get
the log energy value of each filter. The
positions of these filters are evenly spaced
Discrete cosine transform or DCT: The
DCT is applied on the logarithm of the
triangular band
In speech recognition, feature extraction
requires much attention only because of main
recognize process depends heavily on this phase.
of feature
common technique MFCC and the
sed briefly.
Mel Frequency Cepstral Coefficents
(MFCCs) are a feature widely used in automatic
speech and speaker recognition. They were
introduced by Davis and Mermelstein in the 1980's,
ever since. Prior to
the introduction of MFCCs, Linear Prediction
Coefficients (LPCs) and Linear Prediction Cepstral
Coefficients (LPCCs) and were the main feature
type for automatic speech recognition (ASR).The
extraction of the best parametric representation of
acoustic signals is an important task to produce a
The efficiency of this phase is important for
the next phase since it affects its behavior. MFCC is
based on human hearing perceptions which extract
ive frequencies over 1 Khz. In other
words, in MFCC is based on known variation of the
human ears critical bandwidth with frequency.
MFCC has two types of filter which are spaced
linearly at low frequency below 1000 Hz and
A subjective
pitch is present on Mel Frequency Scale to capture
important characteristic of phonetic in speech.
MFCC consists of seven computational steps. Each
step has its function and mathematical approaches
Fig. 2 Block Diagram of MFCC
Step 1: Pre–emphasis
This step processes the passing of signal
through a filter which emphasizes higher
frequencies. This process will increase the energy
of signal at higher frequency.
Y [n] = X [n] - 0. 95 X [n -
Let’s consider a = 0.95, which make 95% of any
one sample is presumed to originate from previous
sample.
Step 2: Framing
The process of segmenting the speech
samples obtained from analog to digital conversion
(ADC) into a small frame with the length
range of 20 to 40 msec. The voice signal is divided
into frames of N samples. Adjacent frames are
being separated by M (M<N). Typical values used
are M = 100 and N= 256.
Step 3: Hamming windowing
Hamming window is used as window shape
by considering the next block in feature extraction
processing chain and integrates all the closest
frequency lines. The Hamming window equation is
given as:
If the window is defined as W (n), 0
where
N = number of samples in each frame
Y[n] = Output signal
X (n) = input signal
W(n) = Hamming window, then the result of
windowing signal is shown below:
– Dec 2015
Page 110
Block Diagram of MFCC
This step processes the passing of signal
through a filter which emphasizes higher
frequencies. This process will increase the energy
- 1] (1)
Let’s consider a = 0.95, which make 95% of any
one sample is presumed to originate from previous
The process of segmenting the speech
samples obtained from analog to digital conversion
(ADC) into a small frame with the length within the
range of 20 to 40 msec. The voice signal is divided
into frames of N samples. Adjacent frames are
being separated by M (M<N). Typical values used
Hamming window is used as window shape
idering the next block in feature extraction
processing chain and integrates all the closest
frequency lines. The Hamming window equation is
If the window is defined as W (n), 0 ≤ n ≤ N-1
N = number of samples in each frame
(n) = Hamming window, then the result of
windowing signal is shown below:
4. International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov – Dec 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 111
Y (n) = X (n) * W (n) (2)
W (n) - 0.54 - 0.46 cos ≤ ≤ − (3)
Step 4: Fast Fourier Transform
To convert each frame of N samples from
time domain into frequency domain. The Fourier
Transform is to convert the convolution of the
glottal pulse U[n] and the vocal tract impulse
response H[n] in the time domain. This statement
supports the equation below:
Y (w) = FFT [h (t)* X (t)] = H (w)* X (w) (4)
If X (w), H (w) and Y (w) are the Fourier
Transform of X (t), H (t) and Y (t) respectively.
Step 5: Mel Filter Bank Processing
The frequencies range in FFT spectrum is
very wide and voice signal does not follow the
linear scale. Mel scale filter bank, from a set of
triangular filters that are used to compute a
weighted sum of filter spectral components so that
the output of process approximates to a Mel scale.
Each filter’s magnitude frequency response is
triangular in shape and equal to unity at the centre
frequency and decrease linearly to zero at centre
frequency of two adjacent filters. Then, each filter
output is the sum of its filtered spectral components.
After that the following equation is used to compute
the Mel for given frequency f in HZ:
F (Mel) = [2595 * log 10 [1 + f ] 700 ] (5)
Step 6: Discrete Cosine Transform
This is the process to convert the log Mel
spectrum into time domain using Discrete Cosine
Transform (DCT). The result of the conversion is
called Mel Frequency Cepstrum Coefficient. The
set of coefficient is called acoustic vectors.
Therefore, each input utterance is transformed into
a sequence of acoustic vector.
Step 7: Delta Energy and Delta Spectrum
The voice signal and the frames changes,
such as the slope of a formant at its transitions.
Therefore, there is a need to add features related to
the change in cepstral features over time . 13 delta
or velocity features (12 cepstral features plus
energy), and 39 features a double delta or
acceleration feature are added. The energy in a
frame for a signal x in a window from time sample
t1 to time sample t2, is represented at the equation
below:
Energy = ∑ X 2
[t] (6)
Each of the 13 delta features represents the change
between frames in the equation 8 corresponding
cepstral or energy feature, while each of the 39
double delta features represents the change between
frames in the corresponding delta features.
( ) =
( ) ( )
(7)
The result of the conversion is called Mel
Frequency Cepstrum Coefficient. The set of
coefficients is called acoustic vectors. Therefore,
each input utterance is transformed into a sequence
of acoustic vectors. The speech waveform is
cropped to remove silence or acoustical interference
that may be present in the beginning or end of the
sound file. The windowing block minimizes the
discontinuities of the signal by tapering the
beginning and end of each frame to zero. The FFT
block converts each frame from the time domain to
the frequency domain. In the Mel-frequency
wrapping block, the signal is plotted against the
Mel spectrum to mimic human hearing. In the final
step, the Cepstrum, the Mel - spectrum scale is
converted back to standard frequency scale. This
spectrum provides a good representation of the
spectral properties of the signal which is a key for
representing and recognizing characteristics of the
speaker [4].
C. Features of MFCC
Commonly allow remote person
authentication. Reduces the frequency information
of the speech signals into a small number of
coefficients. It is easy and relatively fast to compute.
It reduces the influence of low-energy components.
Most efficient and better anti-noise ability than
other vocal tract parameters, such as LPC. The
reason for MFCC being most frequently used for
extracting features is that it is most nearest to the
actual human auditory speech perception. MFCC is
worn to recognize information automatically verbal
5. International Journal of Engineering and Techniques
ISSN: 2395-1303 http://www.ijetjournal.org
into a telephone, airline reservation, voice
recognition system for security purpose etc.
D. Limitations of MFCC
MFCC is Noise Sensitive where it is
common to normalize their values in speech
recognition system to reduce the influence of noise.
And also need some concentration to reduce the
influence of low energy components.
conceivably more invariant to background noise
and could capture characteristics in the signal where
MFCCs tend to fail. The feature space of a MFCC
obtained using DCT is not directly dependent on
speech data, the observed signal with noise does not
show good performance without utilizing noise sup
pression methods. The performance of the Mel
Frequency Cepstrum Coefficients (MFCC) may be
affected by the number of filters, the shape of filters,
the way that filters are spaced and the way that the
power spectrum is warped. MFCC values are not
very robust in the presence of additive noise, and so
it is common to normalize their values in speech
recognition systems to reduce the influence of noise
[9].
E. Relative Spectral Processing (RASTA)
In speech recognition the process of
decoding the linguistic message in speech. The
speech signal reflects the movement of vocal
The rate of change of nonlinguistic components in
speech often lies outside the typical rate of change
of the vocal tract shape. The RelAtive SpecTrAl
(RASTA) suppress the spectral components that
change more slowly or quickly than the typical
range of change of speech. RASTA technique
implements to improve the performance of
recognition among the convolution and additive
noise.
International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov –
http://www.ijetjournal.org
into a telephone, airline reservation, voice
recognition system for security purpose etc.
MFCC is Noise Sensitive where it is
common to normalize their values in speech
recognition system to reduce the influence of noise.
And also need some concentration to reduce the
influence of low energy components. it is
conceivably more invariant to background noise
and could capture characteristics in the signal where
The feature space of a MFCC
obtained using DCT is not directly dependent on
speech data, the observed signal with noise does not
ow good performance without utilizing noise sup
The performance of the Mel-
Coefficients (MFCC) may be
affected by the number of filters, the shape of filters,
the way that filters are spaced and the way that the
power spectrum is warped. MFCC values are not
very robust in the presence of additive noise, and so
rmalize their values in speech
o reduce the influence of noise
In speech recognition the process of
decoding the linguistic message in speech. The
speech signal reflects the movement of vocal tract.
The rate of change of nonlinguistic components in
speech often lies outside the typical rate of change
RelAtive SpecTrAl
suppress the spectral components that
change more slowly or quickly than the typical
e of change of speech. RASTA technique
implements to improve the performance of
recognition among the convolution and additive
Fig. 3 Block Diagram of RASTA Filtering
To compensate for linear channel distortions
the analysis library provides the
rasta filtering. The rasta filter is used either within
the log spectral or cepstral domains. In result the
rasta filter band passes every feature coefficient.
Linear channel distortions seem as an additive
constant in each the log spectral and therefore the
cepstral domains. The high-pass portion of the
equivalent bandpass filter alleviates the result of
convolution noise introduced in the channel. The
low-pass filtering helps in smoothing f
border spectral changes [5].
F. Advantage of RASTA
It is a band pass filtering technique.
Designed to reduce impact of noise as well as
enhance speech. It is a technique which is generally
used for the speech signals that have background
noise or simply noisy speech. Removes the slow
varying environmental variations as well as the fast
variations in artifacts. This technique does not
depend on the choice of microphone or the position
of the microphone to the mouth, and so
Capture frequencies with low modulati
correspond to speech. RASTA gives a better
performance ratio.
G. Limitations of RASTA
This technique causes a minor deprivation
in performance for the clean information but it also
slashes the error in half for the filtered case.
III. EVALUATION OF FEATURE
EXTRACTION TECHNIQUE
– Dec 2015
Page 112
Block Diagram of RASTA Filtering
To compensate for linear channel distortions
the analysis library provides the power to perform
rasta filtering. The rasta filter is used either within
the log spectral or cepstral domains. In result the
rasta filter band passes every feature coefficient.
Linear channel distortions seem as an additive
al and therefore the
pass portion of the
equivalent bandpass filter alleviates the result of
convolution noise introduced in the channel. The
pass filtering helps in smoothing frame to
It is a band pass filtering technique.
Designed to reduce impact of noise as well as
enhance speech. It is a technique which is generally
used for the speech signals that have background
noise or simply noisy speech. Removes the slow
ronmental variations as well as the fast
variations in artifacts. This technique does not
depend on the choice of microphone or the position
of the microphone to the mouth, and so it is robust.
frequencies with low modulations that
RASTA gives a better
This technique causes a minor deprivation
in performance for the clean information but it also
slashes the error in half for the filtered case.
FEATURE
TECHNIQUE
6. International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov – Dec 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 113
Isolated word recognition, require a quiet
gap between each utterance on both side of sample
windows. It accepts single words or single
utterances at a time .This is having “Listen and Non
Listen state”. Isolated utterance might be better
name of this class. The pre-recorded dataset are
used for extracting the feature from speech signal.
H. MFCC for Isolated Word
The feature extraction is usually a non-
invertible (lossy) transformation. Making an
analogy with filter banks, such transformation does
not lead to perfect reconstruction, i.e., given only
the features it is not possible to reconstruct the
original speech used to generate those features.
Computational complexity and robustness are two
primary reasons to allow loosing information.
Increasing the accuracy of the parametric
representation by increasing the number of
parameters leads to an increase of complexity and
eventually does not lead to a better result due to
robustness issues. The greater the number of
parameters in a model, the greater should be the
training sequence.
Speech is usually segmented in frames of 20
to 30 ms, and the window analysis is shifted by 10
ms. Each frame is converted to 12 MFCCs plus a
normalized energy parameter. The first and second
derivatives (D’s and DD’s) of MFCCs and energy
are estimated, resulting in 39 numbers representing
each frame. Assuming a sample rate of 8 kHz, for
each 10 ms the feature extraction module delivers
39 numbers to the modeling stage. This operation
with overlap among frames is equivalent to taking
80 speech samples without overlap and representing
them by 39 numbers. In fact, assuming each speech
sample is represented by one byte and each feature
is represented by four bytes (float number), one can
see that the parametric representation increases the
number of bytes to represent 80 bytes of speech (to
136 bytes). If a sample rate of 16 kHz is assumed,
the 39 parameters would represent 160 samples. For
higher sample rates, it is intuitive that 39
parameters do not allow reconstructing the speech
samples back. Anyway, one should notice that the
goal here is not speech compression but using
features suitable for speech recognition [6].
The construction of technique is
implemented by loading the signal in the MFCC
code and after filtering the decoded signals are
allowed to calculate the feature. The calculation
provides the range of mean based on the mean
calculation, peak by the calculation, MFCC filter by
MFCC coding , pitch and finally their vector that
are quantatized using the delta energy.
I. RASTA for Isolated Word
The same pre-recorded speech dataset is
implemented in spectral filtering feature extraction
technique. To obtain better noise suppression for
communication systems the fixed RASTA filters
were replaced by a bank of non causal FIR Wiener-
like filters. The output of each filter is given as
Si ( ) = ∑ i (j) Yi (k-j)
Here, ( ) is estimate of clean speech in frequency
bin “i” and frame-index “k”, ( ) is noisy speech
spectrum, are the weights of the filter and M is
order of the filter. In this method the weights ( )
are obtained .Such that ( ) is least square
estimate of clean speech ( ) or each frequency
bin i. The order M = 10 corresponds to 21tap non-
causal filters. The filters were designed based on
optimization on 2 minutes of speech of a male
speaker recorded at 8 kHz sampling over public
analog cellular line from a relatively quiet library.
The published response of the filter corresponding
to bins in the frequency range 300Hz to 2300 Hz is
a band-pass filter, emphasizing modulation
frequency around 6-8 Hz. Filters corresponding to
the 150-250 Hz and 2700-4000 Hz regions are low
gain, low-pass filters with cut off frequency of 6 Hz.
For very low frequency bins (0-100 Hz) the filters
have flat frequency response with 0 dB gain [8].
When speech signal are implemented for
analysis of spectral form those analyzed signals are
compressed as static nonlinearities. The banks of
compressed signals are next allowed for the linear
bandpass filters. After that the band pass are
expanded as static nonlinear spectrals and optional
processing like decompression are implemented if
needed. The RASTA method differs from the other
7. International Journal of Engineering and Techniques
ISSN: 2395-1303 http://www.ijetjournal.org
feature calculation by using a filter with a broader
pass-band. The RASTA processing does filtering
between two static nonlinearities that are not
necessarily the inverse of one another. The features
can be viewed as a spectral case of temporal
RASTA processing trajectories of cepstral
coefficient.
IV. RESULTS AND DISCUSSIONS
The speech signals are derived for isolated
words by recording the words.
utterance HMM recognizes was used for the
experiment. One same speech signal are
implemented for both the extraction technique and
compared each other. As a first step th
word speech signal are filtered by MFCC
coefficients and the results are noted. As the second
step, same speech signals are filtered using RASTA
filters and results are noted. The performances are
analyzed and implemented by using an efficient
tool MATLAB. The performance metrics namely
signal noise Ratio, accuracy and time factors are
considered. By using the technique RASTA,
efficient results were produced. The left side of the
fig.4 shows the result of Speech Signals which are
extracted from MFCC. The Right side denotes th
result of extracted, filtered, noiseless speech signals
from RASTA Filtering.
Fig.4 Filtered Signal of MFCC and RASTA
Compared with the MFCC filters it cut off
the frequency of all the shortest cepstral mean
subtraction. The main mean difference between the
MFCC and RASTA processing is log spectral
domain that merely removes the component of short
term log spectrum and enhances
transitions. A speech signal which is filtered by
International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov –
http://www.ijetjournal.org
feature calculation by using a filter with a broader
does filtering
between two static nonlinearities that are not
necessarily the inverse of one another. The features
can be viewed as a spectral case of temporal
RASTA processing trajectories of cepstral
DISCUSSIONS
The speech signals are derived for isolated
words by recording the words. An isolated
was used for the
experiment. One same speech signal are
implemented for both the extraction technique and
compared each other. As a first step the isolated
word speech signal are filtered by MFCC
coefficients and the results are noted. As the second
are filtered using RASTA
filters and results are noted. The performances are
analyzed and implemented by using an efficient
ool MATLAB. The performance metrics namely
signal noise Ratio, accuracy and time factors are
considered. By using the technique RASTA,
The left side of the
shows the result of Speech Signals which are
FCC. The Right side denotes the
speech signals
Filtered Signal of MFCC and RASTA
Compared with the MFCC filters it cut off
the frequency of all the shortest cepstral mean
on. The main mean difference between the
MFCC and RASTA processing is log spectral
domain that merely removes the component of short
the spectral
transitions. A speech signal which is filtered by
both extraction filters are shown in the fig.
applying MFCC the signals are sampled along with
the noises are compared with RASTA filters .
RASTA Technique, the feature vector is extracted
better than the MFCC technique. The following
figure shows the clean data which is compared with
MFCC and RASTA. The two speech
implemented in both MFCC and RASTA technique.
When first speech signal are filtered their results are
plotted in a graph on the basis of signal noise ratio
and their error rate in the form of percentage are
declared. After that the second speech signal are
filtered using spectral filter and the error rate are
also plotted in the graph. In this graph o denote
signal without peak and + denote signal with peak.
Fig.5 Comparative Results of MFCC and RASTA
V. CONCLUSIONS
In automatic speech recognition system,
Feature extraction includes the process of
converting speech signals to the digital form and
measures important characteristics of signal i.e.
energy or frequency and augment these
measurement with meaningful derived uttered
speech signal for utilizing in recognition. Short
sections of the speech signal are isolated
for processing. Speech processing is the ability of
converting the machine language into speech
signals. For extracting speech signal, techniques
like Linear Predictive Code, Perceptual Linear
Prediction, Mel Frequency Cepstrum Coefficient
Wavelet and RelAtive SpecTrAl are used. Mel
scale Frequency Cepstrum Coefficients (MFCC)
the most frequently used for speech recognition.
– Dec 2015
Page 114
hown in the fig.4.
applying MFCC the signals are sampled along with
the noises are compared with RASTA filters . In
RASTA Technique, the feature vector is extracted
better than the MFCC technique. The following
figure shows the clean data which is compared with
MFCC and RASTA. The two speech signals are
both MFCC and RASTA technique.
signal are filtered their results are
plotted in a graph on the basis of signal noise ratio
and their error rate in the form of percentage are
declared. After that the second speech signal are
filtered using spectral filter and the error rate are
ted in the graph. In this graph o denote
signal without peak and + denote signal with peak.
of MFCC and RASTA
In automatic speech recognition system,
Feature extraction includes the process of
signals to the digital form and
measures important characteristics of signal i.e.
energy or frequency and augment these
measurement with meaningful derived uttered
speech signal for utilizing in recognition. Short
sections of the speech signal are isolated and given
for processing. Speech processing is the ability of
converting the machine language into speech
signals. For extracting speech signal, techniques
like Linear Predictive Code, Perceptual Linear
Frequency Cepstrum Coefficient,
let and RelAtive SpecTrAl are used. Mel-
scale Frequency Cepstrum Coefficients (MFCC) is
the most frequently used for speech recognition.
8. International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov – Dec 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 115
This is because MFCCs considers observation
sensitivity of human ear at different frequencies,
and hence, is appropriate for speech recognition.
RelAtive SpecTrAl filtering (RASTA) is an
improved feature extraction technique which is
used to enhance the speech when recorded in a
noisy environment. The time trajectories of the
representations of the speech signals are band pass
filtered in RASTA. Initially, it was just used to
reduce the impact of noise in speech signal but
know it is also used to directly enhance the signal.
From the Comparative study the extraction of
speech signals works more effective in the two
techniques MFCC and RASTA than other
extracting techniques. As a result the speech signal
was extracted on the basis of their Signal Noise
Ratio, frequency and Error Rate. The Results shows
that the feature extraction technique provides best
outcome in RASTA filtering technique than MFCC.
This research work can be further extended in the
following directions: The two techniques are
combined to achieve high level accuracy and
robustness which is the fundamental problem of
background noise in the speech signals. It is also
extended in the direction to develop the complete
accurate applications and will focus on the hearing
impaired people.
REFERENCES
1. Lawrence R. Rabiner And Ronald W,
“Introduction To Digital Speech Processing”,
Foundations And Trends In Signal Processing
Vol. 1, Nos. 1–2 (2007) 1–194.
2. Kishori R.Ghule, R. Deshmukh,“Feature
Extraction Techniques For Speech Recognition:
A Review”, International Journal Of Scientific &
Engineering Research, Vol. 6, Issue 5,
May(2015).
3. Vibha Tiwari, “Mfcc And Its Applications In
Speaker Recognition”, International Journal On
Emerging Technologies 1(1): 19-22(2010).
4. Lindasalwa Muda, Mumtaj Begam And I.
Elamvazuthi, “Voice Recognition Algorithms
Using Mel Frequency Cepstrum Coefficient
(MFCC) And Dynamic Time Warping (DTW)
Techniques” Journal Of Computing, Vol. 2,
Issue 3, March (2010).
5. Hynek Hermansky, “Rasta Processing Of
Speech”, Ieee Transactions On Speech And
Audio Processing, Vol.2, No.4 Oct(1994).
6. Neha P.Dhole, Ajay A.Gurjar, “Detection Of
Speech Under Stress Using Spectral
Analysis”,International Journal of Research In
Engineering And Technology Issn: 2319-1163
Vol.2 Issue: 04, Apr(2013).
7. BhupinderSingh,Rupinder Kaur,Nidhi
Devgun,Ramandeep Kaur, “The Process Of
Feature Extraction In Automatic Speech
Recognition System For Computer Machine
Interaction With Humans: A Review”,
International Journal Of Advanced Research In
Computer Science and Software Engineering,
Vol. 2, Issue 2, Feb(2012).
8. Pratik K. Kurzekar, Ratnadeep R. Deshmukh,
Vishal B. Waghmare , Pukhraj P. Shrishrimal,
“A Comparative Study Of Feature Extraction
Techniques For Speech Recognition System”,
International Journal Of Innovative Research In
Science, Engineering And Technology Issn:
2319-8753 Vol. 3, Issue 12, Dec(2014).
9. Anchalkatyal, Amanpreet Kaur, Jasmeen Gil,
“Punjabi Speech Recognition Of Isolated Words
Using Compound Eemd & Neural Network”
International Journal Of Soft Computing And
Engineering, Issn: 2231-2307, Vol.4, Issue-1,
March (2014).