This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
IRJET- Smart Home System using Voice RecognitionIRJET Journal
This document describes a smart home system that uses voice recognition to control home appliances. The system extracts features from voice commands using MFCC and HMM algorithms via the Google API. These features are converted to text and used to control devices connected to a microcontroller via relays. The system was able to successfully turn lights and fans on and off through voice commands, making it useful for elderly or disabled users to control home appliances with just their voice.
This document presents a text-dependent speaker recognition system using neural networks that aims to improve recognition accuracy. It proposes changing the number of Mel Frequency Cepstral Coefficients (MFCCs) used in training. Voice Activity Detection is also used as a preprocessing step. Experimental results show recognition accuracy increases from 70.41% to a maximum of 92.91% as the number of MFCCs increases from 14 to 20, but then decreases with more MFCCs. The system is implemented on a Raspberry Pi for hardware acceleration.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
This document discusses speaker recognition techniques and applications. It provides an overview of common feature extraction methods for speaker recognition systems such as Linear Predictive Coding (LPC), Perceptual Linear Predictive Coefficients (PLPC), Mel Frequency Cepstral Coefficients (MFCC), and Gamma Tone Frequency Cepstral Coefficients (GFCC). It compares these techniques and finds that MFCC and PLPC are most often used due to being derived from models of human auditory processing. GFCC has benefits of higher frequency resolution and noise robustness compared to MFCC. Speaker recognition has applications in security systems for access control.
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
This document discusses speech emotion recognition using principal component analysis (PCA). It analyzes speech features like mel frequency cepstral coefficients, pitch, energy, and formant frequency from the Berlin database containing emotions like anger, sadness, happiness, and fear. PCA is applied to reduce the feature dimension and decorrelate features. A support vector machine classifier is then used to classify emotions based on the PCA-processed features. Results show applying PCA improves the classification accuracy compared to without using PCA, with accuracy increasing from 68% to 64.5% on average.
This document discusses and compares different feature extraction methods for speaker identification, including Mel Frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and Linear Predictive Cepstral Coefficients (LPCC). It analyzes these features extracted using Gaussian filters and modeled with Gaussian Mixture Models (GMM) and Universal Background Models (UBM). The document finds that fusing MFCC and IMFCC features outperforms using LPCC features alone based on tests on the TIMIT database.
IRJET- Smart Home System using Voice RecognitionIRJET Journal
This document describes a smart home system that uses voice recognition to control home appliances. The system extracts features from voice commands using MFCC and HMM algorithms via the Google API. These features are converted to text and used to control devices connected to a microcontroller via relays. The system was able to successfully turn lights and fans on and off through voice commands, making it useful for elderly or disabled users to control home appliances with just their voice.
This document presents a text-dependent speaker recognition system using neural networks that aims to improve recognition accuracy. It proposes changing the number of Mel Frequency Cepstral Coefficients (MFCCs) used in training. Voice Activity Detection is also used as a preprocessing step. Experimental results show recognition accuracy increases from 70.41% to a maximum of 92.91% as the number of MFCCs increases from 14 to 20, but then decreases with more MFCCs. The system is implemented on a Raspberry Pi for hardware acceleration.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
This document discusses speaker recognition techniques and applications. It provides an overview of common feature extraction methods for speaker recognition systems such as Linear Predictive Coding (LPC), Perceptual Linear Predictive Coefficients (PLPC), Mel Frequency Cepstral Coefficients (MFCC), and Gamma Tone Frequency Cepstral Coefficients (GFCC). It compares these techniques and finds that MFCC and PLPC are most often used due to being derived from models of human auditory processing. GFCC has benefits of higher frequency resolution and noise robustness compared to MFCC. Speaker recognition has applications in security systems for access control.
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
This document discusses speech emotion recognition using principal component analysis (PCA). It analyzes speech features like mel frequency cepstral coefficients, pitch, energy, and formant frequency from the Berlin database containing emotions like anger, sadness, happiness, and fear. PCA is applied to reduce the feature dimension and decorrelate features. A support vector machine classifier is then used to classify emotions based on the PCA-processed features. Results show applying PCA improves the classification accuracy compared to without using PCA, with accuracy increasing from 68% to 64.5% on average.
This document discusses and compares different feature extraction methods for speaker identification, including Mel Frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and Linear Predictive Cepstral Coefficients (LPCC). It analyzes these features extracted using Gaussian filters and modeled with Gaussian Mixture Models (GMM) and Universal Background Models (UBM). The document finds that fusing MFCC and IMFCC features outperforms using LPCC features alone based on tests on the TIMIT database.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET - Threat Prediction using Speech AnalysisIRJET Journal
This document describes a proposed system to analyze audio files and detect potential threats by performing speech recognition and sentiment analysis. The system would have three main steps: 1) Convert speech to text using machine learning algorithms like recurrent neural networks, 2) Perform sentiment analysis on the text using natural language processing techniques like naïve Bayes classification to determine if the text is positive, negative, or neutral, 3) Calculate an overall threat percentage and issue a warning if it exceeds a threshold. The goal is to automate audio surveillance and analysis to reduce time and human error compared to manual processes. Artificial neural networks would be used for both speech recognition and sentiment classification.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
Estimating the quality of digitally transmitted speech over satellite communi...Alexander Decker
This document discusses methods for estimating the quality of digitally transmitted speech over satellite
communication channels. It proposes a methodology that divides the speech signal into segments and calculates the
segmental signal-to-noise ratio (SNR) for each segment. This allows assessing changes in speech quality over time
through statistical modeling. It also considers the spectral characteristics of the speech signal, providing a better
measure than just signal power. The document reviews other existing objective and subjective methods for quality
estimation and argues that the proposed segmental SNR approach has advantages over these other approaches.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
The proposed System for Indoor Location TrackingEditor IJCATR
Indoor location tracking systems are used to locate people or certain objects in buildings and in closed areas. For example,
finding co-workers in a large office building, locating customers within a shopping mall and locating patients in the hospital are a few
applications of indoor location tracking systems. Indoor tracking capability opens up multiple possibilities. To address this need, this
paper describes the implementation of a Bluetooth-based indoor location tracking system that utilizes the integrated Bluetooth modules
in any today's mobile phones to specify and display the location of the individuals in a certain building. The proposed system aims for
location tracking/monitoring and marketing applications for whom want to locate individuals carrying mobile phones and advertise
products and services.
The document describes a proposed vocal code system that allows programmers to write code using speech instead of typing. It aims to help programmers who suffer from repetitive strain injuries or other disabilities that make typing difficult. The system uses speech recognition technology to convert speech to text and then generates valid Java code based on the spoken words. It breaks the system down into modules for the graphical user interface, speech to text conversion, and code generation. It also discusses the technical approaches used, including hidden Markov models and MFCC feature extraction for speech recognition. The goal is to make programming more accessible and reduce physical strain for disabled programmers.
The document analyzes the phenomenon of silent calls (SCs) in mobile networks. Key findings:
1. Measurements of over 50,000 calls found 0.11% experienced a total SC where the receiving party heard silence the entire time.
2. The main causes of total SCs were found to be interference in the radio link and issues during handovers between network technologies.
3. Even with significant loss of speech signal, estimated speech quality was relatively good due to packet loss concealment techniques, with unacceptable quality defined as over 17% loss of speech.
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal background model Gaussian mixture model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
IRJET- Comparative Analysis of Emotion Recognition SystemIRJET Journal
This document summarizes a research paper that analyzes different machine learning models for speech emotion recognition using the SAVEE dataset. It compares CNN, RFC, XGBoost and SVM classifiers using MFCC features extracted from the audio clips. The CNN model achieved the highest accuracy for emotion recognition when using MFCC features. The paper aims to better understand human emotional states through analyzing dependencies in speech data using these machine learning techniques.
Real Time Speaker Identification System – Design, Implementation and ValidationIDES Editor
This paper presents design, implementation and
validation of a PC based Prototype speaker recognition and
verification system. This system is organized to receive speech
signal, find the features of speech signal, and recognize and
verify a person using voice as the biometric. The system is
implemented to capture the speech signal from microphone
and to compare it with the stored data base using filter-bank
based closed-set speaker verification system. At first, the
identification of the voice signals is done using an algorithm
developed in MATLAB. Next, a PC based prototype system is
developed and is validated in real time. Several tests were made
on different sets of voice signals, and measured the performance
and the speed of the proposed system in real environment. The
result confirmed the use of proposed system for various real
time applications.
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
This document discusses a method for speaker identification and verification using Mel Frequency Cepstral Coefficients (MFCC) and Support Vector Machines (SVM). MFCC is used to extract features from speech samples, representing the spectral properties of the voice. SVM is then used to match these features between speech samples to identify or verify speakers. The authors claim this method can identify speakers with up to 95% accuracy and is computationally efficient. It is proposed that MFCC and SVM outperform other techniques for speaker recognition tasks. The system is described as having applications in security, voice interfaces, and other voice-based systems.
This document summarizes research on speaker recognition technologies. It discusses how speaker recognition can be used for biometric authentication by analyzing a person's voiceprint. It reviews literature on MFCC-GMM models for text-independent speaker verification and the use of speaker recognition in biometric security systems. The document also outlines the basic components of a speaker recognition system, including enrollment, feature extraction using MFCCs, and verification through comparison to stored voice templates using algorithms like GMM.
The document discusses key concepts in data communication including the five main components of a data communication system (message, sender, receiver, transmission medium, protocol). It also defines different transmission modes (simplex, half-duplex, full-duplex), signal types (analog, digital), networking devices (modem, hub, switch, router), IP address classes, the 7 layers of the OSI model, and common modulation methods and transmission impairments.
The document discusses mobile phone sensing applications and some of the key challenges. It focuses on how sensing capabilities on phones can enable inferencing about users' activities, social contexts and conversations. It also discusses the architectural design of mobile sensing systems including splitting classification between phones and servers, implementing duty cycles to save energy, and ensuring privacy through local processing and control over data sharing. Maintaining user privacy remains an important open challenge, especially as sensors can capture information about nearby third parties.
Friction Stir Welding of steels: AReview PaperIOSR Journals
Friction Stir Welding (FSW) is a solid state joining process that can be applied to a number of
materials including aluminium, magnesium, copper and steels. A number of researches have been conducted in
Friction Stir Welding of steels and it is the focus of this paper to make a comprehensive review of the work that
has been done. Ultra-low carbon steels, low carbon steels, medium carbon steels, high carbon steels and ultrahigh
carbon steels have been considered and several aspects of FSW of steels have been outlined. These are
tools, mechanical properties and microstructure. It was determined that carbon content, welding speed as well
as rotational speed affect both mechanical properties and microstructure of the joint.
Influence of joint clearance on kinematic and dynamic parameters of mechanismIOSR Journals
Kinematic joints for dynamic analysis of multi-body mechanical systems assumed ideal or perfect.
However, in a real mechanical kinematical joint clearance is always present. Such clearance is necessary for
the component assemblage and to allow the relative motion between the connected bodies. This clearance is
inevitable due to the manufacturing tolerances, material deformations, wear, and imperfections. The presence of
such joint clearance degrades the performance of mechanical systems in virtue of the contact and impact forces
that take place at the joint. Contact analysis is a computational bottleneck in mechanical design where the
contact changes. Manual analysis is time-consuming and prone to error. To address these problems, a
geometrical contact analysis method based on kinematic simulation, using CAD software is developed. An
equivalent kinematic linkage mechanism is constructed according to contact position of pin and hole assembly.
Results of kinematic and dynamic analysis of a four bar linkage with joint clearance shows that the contribution
of joint forces at slower input speed also degrades the performance of mechanism
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IRJET - Threat Prediction using Speech AnalysisIRJET Journal
This document describes a proposed system to analyze audio files and detect potential threats by performing speech recognition and sentiment analysis. The system would have three main steps: 1) Convert speech to text using machine learning algorithms like recurrent neural networks, 2) Perform sentiment analysis on the text using natural language processing techniques like naïve Bayes classification to determine if the text is positive, negative, or neutral, 3) Calculate an overall threat percentage and issue a warning if it exceeds a threshold. The goal is to automate audio surveillance and analysis to reduce time and human error compared to manual processes. Artificial neural networks would be used for both speech recognition and sentiment classification.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
Estimating the quality of digitally transmitted speech over satellite communi...Alexander Decker
This document discusses methods for estimating the quality of digitally transmitted speech over satellite
communication channels. It proposes a methodology that divides the speech signal into segments and calculates the
segmental signal-to-noise ratio (SNR) for each segment. This allows assessing changes in speech quality over time
through statistical modeling. It also considers the spectral characteristics of the speech signal, providing a better
measure than just signal power. The document reviews other existing objective and subjective methods for quality
estimation and argues that the proposed segmental SNR approach has advantages over these other approaches.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
The proposed System for Indoor Location TrackingEditor IJCATR
Indoor location tracking systems are used to locate people or certain objects in buildings and in closed areas. For example,
finding co-workers in a large office building, locating customers within a shopping mall and locating patients in the hospital are a few
applications of indoor location tracking systems. Indoor tracking capability opens up multiple possibilities. To address this need, this
paper describes the implementation of a Bluetooth-based indoor location tracking system that utilizes the integrated Bluetooth modules
in any today's mobile phones to specify and display the location of the individuals in a certain building. The proposed system aims for
location tracking/monitoring and marketing applications for whom want to locate individuals carrying mobile phones and advertise
products and services.
The document describes a proposed vocal code system that allows programmers to write code using speech instead of typing. It aims to help programmers who suffer from repetitive strain injuries or other disabilities that make typing difficult. The system uses speech recognition technology to convert speech to text and then generates valid Java code based on the spoken words. It breaks the system down into modules for the graphical user interface, speech to text conversion, and code generation. It also discusses the technical approaches used, including hidden Markov models and MFCC feature extraction for speech recognition. The goal is to make programming more accessible and reduce physical strain for disabled programmers.
The document analyzes the phenomenon of silent calls (SCs) in mobile networks. Key findings:
1. Measurements of over 50,000 calls found 0.11% experienced a total SC where the receiving party heard silence the entire time.
2. The main causes of total SCs were found to be interference in the radio link and issues during handovers between network technologies.
3. Even with significant loss of speech signal, estimated speech quality was relatively good due to packet loss concealment techniques, with unacceptable quality defined as over 17% loss of speech.
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal background model Gaussian mixture model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
IRJET- Comparative Analysis of Emotion Recognition SystemIRJET Journal
This document summarizes a research paper that analyzes different machine learning models for speech emotion recognition using the SAVEE dataset. It compares CNN, RFC, XGBoost and SVM classifiers using MFCC features extracted from the audio clips. The CNN model achieved the highest accuracy for emotion recognition when using MFCC features. The paper aims to better understand human emotional states through analyzing dependencies in speech data using these machine learning techniques.
Real Time Speaker Identification System – Design, Implementation and ValidationIDES Editor
This paper presents design, implementation and
validation of a PC based Prototype speaker recognition and
verification system. This system is organized to receive speech
signal, find the features of speech signal, and recognize and
verify a person using voice as the biometric. The system is
implemented to capture the speech signal from microphone
and to compare it with the stored data base using filter-bank
based closed-set speaker verification system. At first, the
identification of the voice signals is done using an algorithm
developed in MATLAB. Next, a PC based prototype system is
developed and is validated in real time. Several tests were made
on different sets of voice signals, and measured the performance
and the speed of the proposed system in real environment. The
result confirmed the use of proposed system for various real
time applications.
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
This document discusses a method for speaker identification and verification using Mel Frequency Cepstral Coefficients (MFCC) and Support Vector Machines (SVM). MFCC is used to extract features from speech samples, representing the spectral properties of the voice. SVM is then used to match these features between speech samples to identify or verify speakers. The authors claim this method can identify speakers with up to 95% accuracy and is computationally efficient. It is proposed that MFCC and SVM outperform other techniques for speaker recognition tasks. The system is described as having applications in security, voice interfaces, and other voice-based systems.
This document summarizes research on speaker recognition technologies. It discusses how speaker recognition can be used for biometric authentication by analyzing a person's voiceprint. It reviews literature on MFCC-GMM models for text-independent speaker verification and the use of speaker recognition in biometric security systems. The document also outlines the basic components of a speaker recognition system, including enrollment, feature extraction using MFCCs, and verification through comparison to stored voice templates using algorithms like GMM.
The document discusses key concepts in data communication including the five main components of a data communication system (message, sender, receiver, transmission medium, protocol). It also defines different transmission modes (simplex, half-duplex, full-duplex), signal types (analog, digital), networking devices (modem, hub, switch, router), IP address classes, the 7 layers of the OSI model, and common modulation methods and transmission impairments.
The document discusses mobile phone sensing applications and some of the key challenges. It focuses on how sensing capabilities on phones can enable inferencing about users' activities, social contexts and conversations. It also discusses the architectural design of mobile sensing systems including splitting classification between phones and servers, implementing duty cycles to save energy, and ensuring privacy through local processing and control over data sharing. Maintaining user privacy remains an important open challenge, especially as sensors can capture information about nearby third parties.
Friction Stir Welding of steels: AReview PaperIOSR Journals
Friction Stir Welding (FSW) is a solid state joining process that can be applied to a number of
materials including aluminium, magnesium, copper and steels. A number of researches have been conducted in
Friction Stir Welding of steels and it is the focus of this paper to make a comprehensive review of the work that
has been done. Ultra-low carbon steels, low carbon steels, medium carbon steels, high carbon steels and ultrahigh
carbon steels have been considered and several aspects of FSW of steels have been outlined. These are
tools, mechanical properties and microstructure. It was determined that carbon content, welding speed as well
as rotational speed affect both mechanical properties and microstructure of the joint.
Influence of joint clearance on kinematic and dynamic parameters of mechanismIOSR Journals
Kinematic joints for dynamic analysis of multi-body mechanical systems assumed ideal or perfect.
However, in a real mechanical kinematical joint clearance is always present. Such clearance is necessary for
the component assemblage and to allow the relative motion between the connected bodies. This clearance is
inevitable due to the manufacturing tolerances, material deformations, wear, and imperfections. The presence of
such joint clearance degrades the performance of mechanical systems in virtue of the contact and impact forces
that take place at the joint. Contact analysis is a computational bottleneck in mechanical design where the
contact changes. Manual analysis is time-consuming and prone to error. To address these problems, a
geometrical contact analysis method based on kinematic simulation, using CAD software is developed. An
equivalent kinematic linkage mechanism is constructed according to contact position of pin and hole assembly.
Results of kinematic and dynamic analysis of a four bar linkage with joint clearance shows that the contribution
of joint forces at slower input speed also degrades the performance of mechanism
1) The document proposes a new algorithm called FPMBM (Frequent Pattern Mining with Boolean Matrix) to address some challenges with existing frequent pattern mining algorithms like Apriori.
2) FPMBM uses a Boolean matrix representation of vertically formatted transactional database to more efficiently calculate support counts and generate candidate frequent patterns without multiple database scans.
3) Experimental results on several datasets show that FPMBM outperforms other algorithms like Apriori, DHP, ECLAT, and FP-Growth by requiring less execution time, especially for larger datasets and longer patterns.
This document proposes a method for clustering related Chinese text cases using a combination of Fuzzy K-Means (FKM) clustering and Canopy clustering algorithms. It first discusses factors that can indicate related cases such as location, time, evidence. It then describes representing Chinese text cases as vectors using word segmentation and reducing dimensions. FKM clustering is applied to group related cases but has limitations. Canopy clustering is then used to estimate the optimal number of clusters before applying FKM to overcome limitations and improve efficiency. While this method addresses challenges of clustering Chinese cases, results may not be very accurate and the method could be improved.
This document summarizes an intermediate guardianship based routing protocol proposed to improve the performance of the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad hoc networks. The proposed protocol aims to balance energy consumption among nodes by only allowing nodes with sufficient energy to participate in route requests. It also designates intermediate nodes as "guardians" to relay messages when the source or destination nodes have low energy or link failures occur, in order to increase the probability of message delivery. The document provides background on AODV and related work, and describes how the intermediate guardianship approach extends AODV to address energy efficiency and reliability issues in intermittent and low-power ad hoc networks.
This document describes a design for mobile robot navigation using simultaneous localization and mapping (SLAM) and an adaptive tracking controller with particle swarm optimization in indoor environments. An adaptive fuzzy tracking controller is designed using 9 fuzzy rules to calculate a reference path for navigation between a starting and goal point. Particle swarm optimization is then used to optimize and reduce the time required for the calculated path. The controller is simulated in two indoor environments containing obstacles, and particle swarm optimization is shown to reduce navigation time compared to without its use. This approach allows for efficient mobile robot navigation in indoor monitoring applications.
This document summarizes a research paper that proposes using a logistic regression classifier trained with stochastic gradient descent to predict Twitter users' personalities from their tweets. It begins with an abstract of the paper and an introduction on personality prediction from social media. It then provides more detail on the anatomy of the research, including defining personality prediction from Twitter, its applications, and the general process of using machine learning for the task. Next, it reviews several previous studies on personality prediction from Twitter and social networks, noting their approaches, findings and limitations. It identifies remaining research gaps, such as the need for improved linguistic analysis of tweets and more robust/scalable predictive models. Finally, it proposes using a logistic regression classifier as the personality prediction model to address
The document describes a regression test suite prioritization technique using the Hill Climbing algorithm. It aims to reorder test cases in a test suite to maximize fault detection within a given time limit. The technique records execution times of individual test cases and uses the Hill Climbing algorithm with parameters like the test suite, all possible permutations of test cases, time limit, and functions to calculate time and fitness. The algorithm iteratively makes small improvements to the ordering to find a local optimum that detects the most faults within the time limit. The technique intelligently considers both test execution time and fault detection information to effectively prioritize test cases.
This document summarizes a research paper that proposes a new method for reducing grid current total harmonic distortion in a grid-connected wind energy conversion system using a permanent magnet synchronous generator. The method uses the grid-side inverter, which normally injects power into the grid, to also eliminate high-order harmonics generated by a nonlinear load connected to the grid. Simulation results on a system with a 20 kW permanent magnet synchronous generator show that when a nonlinear load is connected, the proposed harmonic elimination method significantly reduces the total harmonic distortion of the grid current compared to when it is not used. The maximum power point tracking algorithm employed also achieves over 97% tracking efficiency across different wind speeds tested.
This document discusses identifying fault prone modules in software. It begins by classifying metrics that have been used for fault prediction into three categories: requirement metrics, design metrics, and code metrics. These metrics are extracted from the requirements, design documents, and source code, respectively. While each type of metric provides useful information, no single metric can accurately predict fault prone modules. The document then proposes a new compound metric that combines metrics from multiple phases of the software development life cycle to improve fault prone module identification.
This document summarizes a study on the scattering effects on terrestrial free space optical signals in tropical weather conditions. It analyzes the impact of different types of scattering like Rayleigh, Mie, and non-selective scattering caused by rain, fog, haze on the performance of a 2km terrestrial free space optical link. It finds that a 1550nm optical source experiences the least attenuation and determines that it can support up to 5Gbps transmission with a bit error rate below 10-9 under different weather scenarios like clear sky, light rain, medium rain and heavy rain through simulation in OptiSystem 7.0. The study concludes that a 1550nm continuous wave laser with 200mW power, 10mm transmitter aperture and
This document summarizes a study on plastic zone size and effective distance under mixed mode fracture using a volumetric approach. U-notched circular ring specimens made of 45CDS6 steel were subjected to compression loading with notch radii ranging from 0.15-2mm and angles from 0-33 degrees. Finite element analysis was conducted to determine stress distributions. Two methods were used to evaluate plastic zone size - the volumetric method relating it to effective stress and notch stress intensity factor, and the von Mises yield criterion. The plastic zone sizes determined from both methods showed good agreement. A new model was proposed to evaluate plastic zone size under mixed mode fracture conditions.
This case study investigates time and cost overruns in the construction of a steel plant in Asia. The original project was estimated to cost $52.52 million and be completed by 2006. However, delays in site selection, changes to the project scope due to market changes, and delays caused by contractors and consultants led to a completion date of 2008 and increased costs of $62.36 million. Several factors contributed to the delays, including a 16-month delay in selecting the project site, which also necessitated changes to the project scope. Piling work took 11 months longer than scheduled due to understaffing by the contractor. Designs from the project consultants were also delayed by 12 months.
The document presents the results of an experimental investigation comparing the surface quality of different glass fiber reinforced plastic (GFRP) composites when undergoing end milling. Specifically, it examines the effect of cutting parameters like spindle speed, feed rate, and depth of cut on the resultant force, surface roughness, and delamination of uni-directional and bi-directional GFRP laminates milled with a solid carbide end mill. The experiments were designed using Taguchi methods and analyzed to determine the most significant parameters and draw conclusions about the machinability of different GFRP materials.
This document presents a dynamical model of a vibrocompactor used in railroad construction. The model represents the vibrocompactor as a two-mass mechanical system with two degrees of freedom. Differential equations are derived to describe the forced vibrations of the system excited by an inertial disturbance. Numerical analysis is performed to examine the system's behavior under different forcing frequencies, including its natural frequencies and frequency response. The results indicate the machine can operate steadily over a wide range of frequencies except near its first natural frequency, where non-stationary beating vibrations occur.
This document analyzes the video streaming throughput performance over LTE networks using the OPNET simulation tool. It simulates two scenarios: 1) downlink and uplink video conferencing with static users and 2) the same with users moving at 30m/s. The key metrics measured are packet delay variation and end-to-end delay. The results show that static users experience higher packet delay variation than mobile users, likely due to increased traffic accumulation. End-to-end delay is also higher for static users compared to those moving at 30m/s.
This document provides a review of structural evaluation of stainless steel as a strengthening material. It discusses the mechanical properties of stainless steel, including its stress-strain behavior, behavior at elevated temperatures, and corrosion resistance. It also discusses the costs associated with stainless steel and outlines previous research activities testing stainless steel specimens. Specifically, it summarizes that stainless steel exhibits a rounded stress-strain curve without a defined yield point, has good strength and stiffness retention at high temperatures, and has varying resistance to different types of corrosion depending on its alloy content.
Characteristics analysis of silicon carbide based 1-D Photonic crystal with 2...IOSR Journals
This document analyzes the characteristics of 1-D photonic crystals with a 2-layer unit cell, with one layer being silicon carbide. Transfer matrix modeling is used to calculate transmission coefficients for different material combinations. Key findings include:
- Silicon/silica photonic crystals have a stop band of 850nm and pass band of 700nm, with a center gap of 1680nm.
- Increasing the number of silicon carbide/silicon unit cells from 10 to 30 produces an actual stop band, with a width of 320nm.
- Silicon carbide/air crystals have the widest stop band of 950nm and a center gap of 1680nm.
- Small refractive
This document provides an overview of automotive diagnostic communication protocols, including KWP2000, CAN, and UDS. It begins with background on automotive electronic control units and the need for diagnostic systems in vehicle development, manufacturing, and service. It then describes the main characteristics of the KWP2000, CAN, and UDS protocols, including their frame formats, layers, and applications in automotive diagnostic communication. The document aims to present these protocols and how they enable diagnostic devices and ECUs in a vehicle network to communicate according to standards.
The document evaluates the performance of single and hybrid optical amplifiers for a 160x10 Gbps dense wavelength division multiplexing (DWDM) system over different transmission distances ranging from 40 to 240 km. It simulates and compares the gain, output power, bit error rate, eye height, and Q-factor of different amplifier configurations including erbium-doped fiber amplifier (EDFA), semiconductor optical amplifier (SOA), Raman amplifier, and their hybrid combinations. The results show that the SOA-Raman-EDFA hybrid amplifier configuration provides the highest gain (53.06 to 50.84 dB), highest output power (39.91 to 37.7 dBm), lowest bit error rate (-37.66
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
This document analyzes speech recognition performance based on neural network layers and transfer functions. It describes a system setup that uses MFCC features and ANNs for acoustic modeling. Experiments were conducted by varying the number of neural network layers (1, 2, 3 layers) and transfer functions (linear, sigmoid, tangent sigmoid). Results showed that networks with linear transfer functions in all layers achieved the performance goal, while other configurations reached minimum gradient but not the goal. The best architecture was a 3-layer network with linear transfer functions.
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
Intelligent hands free speech based smsKamal Spring
The document describes a hands-free speech-based SMS system developed for the Android platform. It uses speech recognition with Hidden Markov Models to allow users to enter phone numbers and messages verbally to send SMS texts. The system was tested and found to recognize speech accurately over 90% of the time with delays under 100ms. It also implements some validation to only accept numeric input for phone numbers and sends messages once the user confirms. The system aims to provide accessibility for users who have difficulty with manual text entry.
Intelligent hands free speech based smsKamal Spring
Over the years speech recognition has taken the market. The speech input can be used in varying domains such as automatic reader and for inputting data to the system. Speech recognition can minimize the use of text and other types of input, at the same time minimizing the calculation needed for the process. Decade back speech recognition was difficult to use in any system, but with elevation in technology leading to new algorithms, techniques and advanced tools. Now it is possible to generate the desired speech recognition output. One such method is the hidden markov models which is used in this paper. Voice or signaled input is inserted through any speech device such as microphone, then speech can be processed and convert it to text hence able to send SMS, also Phone number can be entering either by voice or you may select it from contact list. Voice has opened up data input for a variety of user’s such as illiterate, Handicapped, as if the person cannot write then the speech input is a boon and other’s too which Can lead to better usage of the application. This application also included that user can only input numeric character for contact information, i.e. the security validation for number is done. SR will listen to input and convert numeric to text and will be displayed on contact information to verify. If any user try to insert any other character into the information an error would be displayed e.g. if user speaks his name for contact, it will be displayed as invalid contact. The message box can accept any character. To use the speech recognition user has to be loud and clear so that command is properly executed by the system.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system provided accurate voice recognition under various conditions.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
This document provides a review of speech emotion recognition techniques. It discusses how speech emotion recognition systems work, including common features extracted from speech like MFCCs and LPC coefficients. Classification techniques used in these systems are also examined, such as DTW, ANN, GMM, and K-NN. The document concludes that speech emotion recognition could be useful for applications requiring natural human-computer interaction, like car systems that monitor driver emotion or educational tutorials that adapt based on student emotion.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET Journal
The document presents a voice recognition system that uses the Mel Frequency Cepstral Coefficients (MFCC) algorithm to extract voice features from a user's speech and activate a device if the voice matches a stored sample. The system operates in two phases: 1) recording and extracting voice features to create a database, and 2) comparing input voice samples to those in the database. If a match is found, the system activates a testing device (Arduino Uno). The MFCC algorithm frames the voice, applies windowing and FFT, filters to a triangular shape, and uses DCT to extract features. Vector quantization is used to match input features to the database for authentication and device activation. Simulation results demonstrate the system's
This paper describes the development of an efficient speech recognition system using different techniques such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ) and Hidden Markov Model (HMM).
This paper explains how speaker recognition followed by speech recognition is used to recognize the speech faster, efficiently and accurately. MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. Then HMM is used on Quantized feature vectors to identify the word by evaluating the maximum log likelihood values
for the spoken word.
This document summarizes a research paper on developing a speech-to-text conversion system for visually impaired people using μ-law companding. The system uses MATLAB to analyze input speech signals, extract features, filter noise, and match signals to samples stored in a database to convert speech to text. A graphical user interface was created to input speech and display recognition results. The system achieved real-time speech recognition and conversion to text with high accuracy using μ-law companding techniques for signal processing and correlation comparisons to the stored samples.
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION ijcsit
Biometrics data recently has become a major role in determining the identity of the person. With such
importance for the use of biometrics data, there are many attacks that threaten the security and integrity of
biometrics data itself. Therefore, it becomes necessary to protect the originality of biometrics data against
manipulation and fraud. This paper presents an authentication technique to achieve the authenticity of
speech signals based on adaptive watermarking technique. The basic idea is depends on extracting the
speech features from the speech signal initially and then using these features as a watermark. The
watermark information embeds into the same speech signal. The short time energy technique is used to
identifying the suitable positions for embedding the watermark in order to avoid the regions that used in
the speech recognition system. After exclusion the important areas that used in speech recognition the
Genetic Algorithm (GA) is used to generate random locations to hide the watermark information in an
intelligent manner. The experimental results have achieved high efficiency in establishing the authenticity
of speech signal and the process of embedding
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text-dependent speaker identification system. It contains three main modules: endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short-Term Energy analysis is used for endpoint detection. Mel-frequency Cepstrum Coefficients (MFCC) is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization (VQ) approach using Linde, Buzo and Gray (LBG) clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung"A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18274.pdf http://www.ijtsrd.com/other-scientific-research-area/other/18274/a-robust-speaker-identification-system/zaw-win-aung
This document summarizes a research paper on speaker recognition using vocal tract features. It discusses extracting pitch using FFT as a vocal source feature and Mel Frequency Cepstral Coefficients (MFCCs) as vocal tract features. These features are used as inputs to a Support Vector Machine (SVM) classifier for binary speaker classification. Experimental results show the SVM achieves up to 94.42% accuracy in classifying speakers based on their MFCC features, and accuracy increases with higher signal-to-noise ratios when noise is added to speech signals. The paper concludes vocal source and tract features can be used together with SVM for text-independent speaker recognition.
Similar to Speech Recognized Automation System Using Speaker Identification through Wireless Communication (20)
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
International Conference on NLP, Artificial Intelligence, Machine Learning an...
Speech Recognized Automation System Using Speaker Identification through Wireless Communication
1. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 6, Issue 1 (May. - Jun. 2013), PP 11-18
www.iosrjournals.org
www.iosrjournals.org 11 | Page
Speech Recognized Automation System Using Speaker
Identification through Wireless Communication
S. R. Suralkar1
, Amol C.Wani2
, Prabhakar V. Mhadse3
1
(Electronics & Telecomm Engg Dept, SSBT’s COET, North Maharashtra University, Jalgaon, India)
2
(Electronics & Telecomm Engg Dept, SSBT’s COET, North Maharashtra University, Jalgaon, India)
3
(Electronics & Telecomm Engg Dept, SSBT’s COET, North Maharashtra University, Jalgaon, India)
Abstract : This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification.
Keywords — Automation system, MATLAB code, MFCC, speaker identification, ZigBee transceiver.
I. Introduction
Development of automation systems using speaker identification began in the 1960s with exploration
into speech analysis using text matching, where characteristics of an individual's voice were thought to be able
to characterize the uniqueness of an individual much like a fingerprint. The early systems had many flaws and
research ensued to derive a more reliable method of predicting the correlation between two sets of speech
utterances. The home evolutionary developments time from the era in which man became sedentary to stop
living inside caves and start building their homes. These evolutionary trends of homes automation are focused
on several main issues such as security, culture, leisure, comfort, energy savings, management and economic
activities. Over the years much work has been done in the domain of automatic speech recognition for
automation systems. The progress made is significant for small, medium and even large vocabulary systems.
Speaker Identification based automation is one of the major growing industries that can change the way people
live with security in operation. The aim of such Automation System is to provide those with special needs with a
system that can respond to speech commands and control the on/off status of electrical devices, such as lamps,
fans, television etc, in the home and office wirelessly. The system should be reasonably cheap and easy to install
in any environment.
Generally, human voice conveys much information such as gender, emotion and identity of the
speaker. The objective of voice recognition is to determine which speaker is present based on the individual‟s
utterance [3]. Several techniques have been proposed for reducing the mismatch between the testing and training
environments. Many of these methods operate either in spectral, or in cepstral domain. Firstly, human voice is
converted into digital signal form to produce digital data representing each level of signal at every discrete time
step like zero crossing. The digitized speech samples are then processed using MFCC to produce voice features.
The performance of the speech recognition system is given in terms of error rate as measured for a specified
technology. Vocal communication between person and computer includes synthesis of speech which is in
Matlab database and matching of speech features with incoming speech commands. Speaker recognition
involves speaker identification to output the identity of speaker among the population as switching on/off of the
electrical equipment.
II. System Overview
Fig.1 show overall system confined with software and hardware with wireless units. A speaker
identification based automation system is integrated system including separate transmitter and receiver module
connected wirelessly and MATLAB software coding using MFCC algorithm to extract speech features for
Speaker Identification.
2. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 12 | Page
Fig.1: Functional Block Diagram of Automation system for Speaker Identification
This project aims to develop an automated system capable of controlling operation of various devices
using voice command, so that respective user can handle, simply, such devices with authentication. Fig. 2 and
Fig. 3 shows the architecture used for transmitter and receiver module, for speech interpretation an
environmental microphone and communication interface, which captures and digitize voice more clearly in
Matlab database and in speech processor. Microcontroller unit is used to control every operation of speech
processor and speech commands coming from Matlab simulator.
Fig. 2: Transmitter section.
Speaker identification based automation system is an integrated system that facilitates user with an
easy-to-use home automation system that can be fully operated based on speech commands. This system is
handled by particular user only, and hence more secure system. The system is constructed in a way that is easy
to install, configure, run, and maintain. The functional blocks of the transmitter module is shown in Fig. 2.
Fig. 3 shows receiver section which converts digital signal from speech processor and Matlab simulator
to analog signal to drive appliances using microcontroller unit.
Fig. 3: Receiver section.
3. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 13 | Page
We have selected the most appropriate speech recognition method using Processor HM2007 which will
provide instructions to activate the systems actuators. In this case the actuators will be switches or motors which
control lighting, heating and other features inside the home and office. Here, speech signal is fed to speech
processor for speech analysis and same signal is applied to MATLAB using speech detect and speech write
commands speech is saved in Matlab database. After the above we have selected the signal bus and
communication protocol between computer and actuators. The most common are: dedicated wiring, twisted pair,
ZigBee, coaxial cable, power supply installed.
III. Speech Recognition
3.1 Speech Recognition Algorithm
A speech analysis is done after taking an input through microphone from a user. The design of the
system involves processing of the input audio signal. At different levels, different operations are performed on
the input signal such as Pre-emphasis, Framing, Windowing, Mel Cepstrum analysis and Recognition
(Matching) of the spoken word [3]. The voice algorithms consist of two distinguished phases. The first one is
training sessions, where recording of speech signal takes place associated with the application or uniqueness,
while, the second one is referred to as operation session or testing phase, where particular users speech
commands matching is tested, as described in Fig. 4.
Fig. 4: Speech Recognition Algorithm.
3.2 MFCC Algorithm
The extraction of the best parametric representation of speech signals is an important task to produce a
better recognition performance. The efficiency of this phase is important for the next phase since it affects its
behavior. MFCC is based on human hearing perceptions which cannot perceive frequency over 1Khz. MFCC is
based on known variation of the human ear‟s critical bandwidth with frequency. MFCC has two types of filter
which are spaced linearly at low frequency below 1000 Hz and logarithmic spacing above 1000Hz. A subjective
pitch is present on Mel Frequency Scale to capture important characteristic of phonetic in speech [3]. It checks
zero crossing in speech signal.
The cepstrum is a common transform used to gain information from a person‟s speech signal. It can be
used to separate the excitation signal (which contains the words and the pitch) and the transfer function (which
contains the voice quality). It is the result of taking Fourier transform of decibel spectrum as if it were a signal.
We use cepstral analysis in speaker identification because the speech signal is of the particular form above, and
the "cepstral transform" of it makes analysis incredibly simple. Mathematically,
cepstrum of signal = FT[log{FT(the windowed signal)}].
MFCCs are commonly calculated by first taking the Fourier transform of a windowed function of a
signal and mapping the powers of the spectrum obtained onto the mel scale, using triangular overlapping
windows. Next the logs of the powers at each of the mel frequencies are taken, Direct Cosine Transform is
applied to it. The MFCCs are the amplitudes of the resulting spectrum.
Step-I: Pre–emphasis
This step processes the passing of signal through a filter which emphasizes higher frequencies. This
process will increase the energy of signal at higher frequency.
Step-II: Framing
The short time analysis is fundamental to speech analysis techniques. It is assumed that over a long
interval of time speech waveform is not stationary but over a sufficiently short time interval about 10-30msec, it
can be considered stationary. This is due to that fact that the rate at which the spectrums of the speech signal
changes is directly dependant on the rate of movement of the speech articulators. Most speech analysis systems
4. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 14 | Page
operate at uniformly spaced time intervals or frames of typical duration 10-30 msec. In frame blocking, the
continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M <
N). The first frame consists of the first N samples and The second frame begins M samples after the first frame,
and overlaps it by N - M samples. Similarly, the third frame begins 2M samples after the first frame (or M
samples after the second frame) and overlaps it by N - 2M samples. This process continues until all the speech is
accounted for within one or more frames. Typical values for N and M are N = 256 (which is equivalent to ~
30msec windowing and facilitate the fast radix-2 FFT) and M = 100 [2].
Step-III: Windowing
The next step in the speech processing is to window each individual frame so as to minimize the signal
discontinuities at the beginning and end of each frame. When we perform Fourier Transform, it assumes that the
signal repeats, and the end of one frame does not connect smoothly with the beginning of the next one. This
adds some glitches at regular intervals. So we have to make the ends of each frame smooth enough to connect
with each other. This is possible by a processing called Windowing. This is done by Hamming windowing
method.
Fig. 5: Hamming Window.
Step-IV: Fast Fourier Transform
The Fast Fourier Transform is used to convert each frame of N samples from the time domain into the
frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT) which is
defined on the set of N samples {xn}, as follow:
Note that we use j here to denote the imaginary unit, i.e. j= √-1. In general Xn ‟s are complex numbers.
Step- V: Mel Filter Bank Processing
The range of frequencies in FFT is very wide and voice signal does not follow the linear scale. The
bank of filters according to Mel scale as shown in figure 6 is used [3] to smoothen scale.
Fig. 6: Mel scale filter bank.
The Mel for given frequency F in Hz:
F (mel) = [2595 * log10 [1+f] 700]
Step-VI: Discrete Cosine Transform
This is the process just reverse to FFT to convert the log Mel spectrum into time domain using Discrete
Cosine Transform (DCT). The result of the conversion is called Mel Frequency Cepstrum Coefficient.
5. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 15 | Page
Step-VII: Delta Energy and Delta Spectrum
The levels of speech signal and the frames changes, therefore, there is a need to add features related to the
change in cepstral features over time. The energy in a frame for a signal X(t) in a window from time sample t1
to time sample t2, is represented at the equation below:
Energy= ∑X2
(t)
IV. Hardware Design
In this section we present the hardware description of the modules used for adaptation of speech
processor HM 2007 and wireless communication like Zigbee Module.
4.1. Speech Processor Module
The main element of the design consist of speech synthesizer/recognition ASIC and ZIGBEE
transceiver module. The speech recognition IC needs to be programmed for various speech commands. This is
done by using a MIC, analog speech signal from MIC are stored in the internal memory of the IC after being
digitized using ADC blocks which is internal process in processor. Once the learning/programming is
completed. The IC is ready to accept the commands. A command issued by the user through MIC will be
digitized and compared with the digitized commands already stored in the internal memory of IC. When a match
is received, microcontroller status will be updated accordingly. The microcontroller in turn will generate a
specific data which will be transmitted through RF channel using ZIGBEE transmitter.
4.2 Microcontroller Module
Fig. 7: Microcontroller Module.
Fig. 8: Microcontroller-89c52 Board
This module receives signal from speech processor and decision making regarding switching ON/OFF
appliance is made here as shown in figure 7. Either from speech processor or Matlab simulator output associated
with appliance like Light or fan is given to microcontroller. Output of microcontroller will toggle for incoming
signal from processors. Wireless module is connected to one of the input port of microcontroller and appliances
are connected to output port.
6. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 16 | Page
4.3 Wireless (Zigbee) Module
Zigbee protocol is the communication protocol that is used in this system for wireless communication
as shown in figure 9. Zigbee offers 250 kbps as maximum baud rate. Range of wireless unit is 100meters, but
practically we get approximately about 30-40 meters.
Fig. 9: Zigbee Module.
V. Tests And Results
In this section, we need to train the system using speech commands, which are saved in database using
.wav format. Following command is used to record speech command for every appliance.
y = wavrecord (30*8000, 8000);
after recording command it is checked whether it is in given sample format using,
Speechdetect (y);
If recording is ok then it is saved in database using command,
Wavwrite (y, 8000, 'speaker.wav');
For example, for light appliance it is saved as,
Wavwrite (y, 8000, 'light.wav');
After recording all commands for various appliances, single database called MODEL file is built, which will
have all commands for specific speaker. This model file is built using following command in Matlab,
„trainscript‟.
Above steps are used to train and record the speech commands. Once commands are saved, our system is ready
to work. At the serial port we use Zigbee transmitter module, which will send on/off commands to appliance,
wirelessly. Using command „digitrecgui‟ speech commands from users are received. If there is match found
results will be shown on screen as shown in figure 10 and figure 11. Figure 10 shows matching for light
command for particular speaker, where as figure 11 shows match for fan.
Fig. 10: command Light matched results.
7. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 17 | Page
Fig. 11: command Fan matched results.
For 10 different speakers our system is tested, where we found following satisfactory results of speaker
identification and appliances worked similarly, results are shown in Table 1.
Sr. No. SPEAKER LIGHT
(%)
FAN
(%)
1 Actual Speaker 80 85
2 Speaker 1 10 10
3 Speaker 2 0 0
4 Speaker 3 0 20
5 Speaker 4 10 0
6 Speaker 5 0 10
7 Speaker 6 20 10
8 Speaker 7 0 10
9 Speaker 8 10 0
10 Speaker 9 0 0
Table 1: Results of Speech recognized Automation system using speaker identification.
For actual speaker and other speakers, statistics is calculated for speech command matching.
VI. Conclusions And Future Work
A speech recognized automation system using speaker identification through wireless communication
is built and implemented. The prototype developed can control electrical devices in a home or office wirelessly.
The system implements Automatic Speech Recognition using speech processor and Speaker Identification
through MATLAB coding using MFCC algorithm. The system implements the wireless network using ZigBee
RF modules for their efficiency and low power consumption.
After analyzing all the information from our project, we found that our algorithm improves the speaker
recognition task. Because the MFCC uses the Mel scale, the approximation and maximum likelihood nature of
speech to the human voice behavior is good, it represent in a better way the voice. We improved the precision in
the recognition task. We found that as long as the number of speakers increases, the number of errors increases;
this is because the distance to each centroid of each speaker can be similar to another and maximum likelihood
of speech is checked by algorithm.
Future work will entail:
Adding confirmation commands to the voice recognition system from appliance.
Integrating appliance control functions to improve the system versatility such as providing control
commands other than ON/OFF commands such as slow down, dim, etc.
Acknowledgements
I, Prabhakar V. Mhadse acknowledge with thanks the guidance received from Prof. A. C. Wani and prof. S. R. Suralkar, Head
of the Department in preparation of this Project.
I would also give my sincere thanks to other staff of college for their warm support. The matter taken from the various sources in
preparing this report has been duly cited in reference.
8. Speech Recognized Automation System Using Speaker Identification through Wireless
www.iosrjournals.org 18 | Page
References
[1] Abhishek Thakur, Neeru Singla, “Design of Matlab-Based Automatic Speaker Recognition and Control System”, International
Journal Of Advanced Engineering Sciences And Technologies, Vol no. 8, issue no. 1.
[2] Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm, Ch.Srinivasa
Kumar et al. / International Journal on Computer Science and Engineering (IJCSE).
[3] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient
(MFCC) and Dynamic Time Warping (DTW) Techniques”, Journal Of Computing, Volume 2, Issue 3, March 2010, Issn 2151-9617.
[4] Rojas-Rodríguez Rafael, Aceves-Pérez Rita1, Cortés-Aburto Obed, García-Meneses Carlos1, Vela-Valdés Luis, “Development and
integration of automated systems aimed to the comfort of seniors and people with disabilities” International Conference on
Intelligent Environments, pp339-342,2012.
[5] Rabiner L. R. and Schafer R. W., Digital Processing of Speech Signals,New Jersey, US: Prentice Hall Inc, 1978.
[6] Augusto J. C. and Nugent C. D., “Smart homes can be smarter”, in Lecture Notes in Artificial Intelligence. Germany vol. 4008, pp
1-15, May 2006.
[7] “XBee-2.5-Manual,” ZigBee RF communication protocol. (2008).
[8] Algimantas R., Ratkevicius K. and Rudzionis V, “Voice interactive systems”, in The Engineering Handbook of Smart Technology
for Aging, Disability and Independence USA, pp 281-296, 2008.
[9] Vovos A. Kladis B.and Fakotakis N.D., "Speech operated smart-home control system for users with special needs", in Proc.
INTERSPEECH, pp.193-196, 2005.
[10] Sixmith A., “Modeling the well being of older people” in The Engineering Handbook of Smart Technology for Aging, Disability
and Independence USA, pp 569-584, 2008.
[11] Muñoz C., Arellan D, Perales F. J. and Fontaner G. “Perceptual and intelligent domotics system for disabled perople” in
proceedings of Sixth IASTED International Conference, USA pp 70-75 September 2011.
[12] Jorge MARTINEZ*, Hector PEREZ, Enrique ESCAMILLA; Masahisa Mabo SUZUKI; ‟‟Speaker recognition using Mel Frequency
Cepstral Coefficients (MFCC) and Vector Quantization (VQ) Techniques” IEEE 2012.
[13] Shannon B.J., Paliwal K.K., “A comparative study of filter bank spacing for speech recognition”,Proc. of Microelectronic engi-
neering research conference, Brisbane, Australia, Nov. 2003.
[14] T. Hill and P. Lewicki, Statistics Methods and Applications. Tulsa, OK: StatSoft Inc., 2006.
[15] S. Furui, Digital Speech Processing, Synthesis, and Recognition. New York: Marcel Dekker, 2001.
[16] Fezari, M.; Khati, A.-E, “New speech processor and ultrasonic sensors based embedded system to improve the control of a
motorized wheelchair,” Design and Test Workshop, 2008. IDT 2008.
[17] A. Spanias, T. Painter and V. Atti Audio Signal Processing and Coding. John Wiley & Sons, 2007.
[18] Mohd. Rihan, M. Salim Beg, Narayan Gehlot; “Developments in Home Automation and Networking”, Proc. National Conference
on “Computing for Nation Development”, Bharati Vidyapeeth Institute of Computer and Management, New Delhi, 23-24 Feb. ,
2007, pp. 61-64.