This research proposes a simulation of the logic series of speech recognition on the MFCC (Mel Frequency Spread Spectrum) based FPGA and Euclidean Distance to control the robotic car motion. The speech known would be used as a command to operate the robotic car. MFCC in this study was used in the feature extraction process, while Euclidean distance was applied in the feature classification process of each speech that later would be forwarded to the part of decision to give the control logic in robotic motor. The test that has been conducted showed that the logic series designed was precise here by measuring the Mel Frequency Warping and Power Cepstrum. With the achievement of logic design in this research proven with a comparison between the Matlab computation and Xilinx simulation, it enables to facilitate the researchers to continue its implementation to FPGA hardware.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
Speaker Identification From Youtube Obtained Datasipij
An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like
YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker
identification is to identify the number of different speakers and prepare a model for that speaker by
extraction, characterization and speaker-specific information contained in the speech signal. It has many
diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security ,
transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary.
The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is
detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian
mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to
examine and judge carefully speaker identification in text independent. The application or employment of
Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity,
awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics
of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious
densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some
arbitrary value in initial estimation and carry on the iterative process until the convergence of value is
observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization
and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82%
of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling
by Expectation maximization parameter estimation depending on variation of parameter.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONcscpconf
This paper reports a word modeling algorithm for the Malayalam isolated digit recognition to reduce the search time in the classification process. A recognition experiment is carried out for the 10 Malayalam digits using the Mel Frequency Cepstral Coefficients (MFCC) feature parameters and k - Nearest Neighbor (k-NN) classification algorithm. A word modeling schema using Hidden Markov Model (HMM) algorithm is developed. From the experimental result it is reported that we can reduce the search time for the classification process using the proposed algorithm in telephony application by a factor of 80% for the first digit recognition.
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
Sundanese language is one of the popular languages in Indonesia. Thus, research in Sundanese language becomes essential to be made. It is the reason this study was being made. The vital parts to get the high accuracy of recognition are feature extraction and classifier. The important goal of this study was to analyze the first one. Three types of feature extraction tested were Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Human Factor Cepstral Coefficients (HFCC). The results of the three feature extraction became the input of the classifier. The study applied Hidden Markov Models as its classifier. However, before the classification was done, we need to do the quantization. In this study, it was based on clustering. Each result was compared against the number of clusters and hidden states used. The dataset came from four people who spoke digits from zero to nine as much as 60 times to do this experiments. Finally, it showed that all feature extraction produced the same performance for the corpus used.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
Speaker Identification From Youtube Obtained Datasipij
An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like
YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker
identification is to identify the number of different speakers and prepare a model for that speaker by
extraction, characterization and speaker-specific information contained in the speech signal. It has many
diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security ,
transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary.
The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is
detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian
mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to
examine and judge carefully speaker identification in text independent. The application or employment of
Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity,
awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics
of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious
densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some
arbitrary value in initial estimation and carry on the iterative process until the convergence of value is
observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization
and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82%
of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling
by Expectation maximization parameter estimation depending on variation of parameter.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONcscpconf
This paper reports a word modeling algorithm for the Malayalam isolated digit recognition to reduce the search time in the classification process. A recognition experiment is carried out for the 10 Malayalam digits using the Mel Frequency Cepstral Coefficients (MFCC) feature parameters and k - Nearest Neighbor (k-NN) classification algorithm. A word modeling schema using Hidden Markov Model (HMM) algorithm is developed. From the experimental result it is reported that we can reduce the search time for the classification process using the proposed algorithm in telephony application by a factor of 80% for the first digit recognition.
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
Sundanese language is one of the popular languages in Indonesia. Thus, research in Sundanese language becomes essential to be made. It is the reason this study was being made. The vital parts to get the high accuracy of recognition are feature extraction and classifier. The important goal of this study was to analyze the first one. Three types of feature extraction tested were Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Human Factor Cepstral Coefficients (HFCC). The results of the three feature extraction became the input of the classifier. The study applied Hidden Markov Models as its classifier. However, before the classification was done, we need to do the quantization. In this study, it was based on clustering. Each result was compared against the number of clusters and hidden states used. The dataset came from four people who spoke digits from zero to nine as much as 60 times to do this experiments. Finally, it showed that all feature extraction produced the same performance for the corpus used.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Limited Data Speaker Verification: Fusion of FeaturesIJECEIAES
The present work demonstrates experimental evaluation of speaker verification for dif- ferent speech feature extraction techniques with the constraints of limited data (less than 15 seconds). The state-of-the-art speaker verification techniques provide good performance for sufficient data (greater than 1 minutes). It is a challenging task to develop techniques which perform well for speaker verification under limited data condition. In this work different features like Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Delta (4), Delta-Delta (44), Linear Prediction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are considered. The performance of individual features is studied and for better verification performance, combination of these features is attempted. A comparative study is made between Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) through experimental evaluation. The experiments are conducted using NIST-2003 database. The experimental results show that, the combination of features provides better performance compared to the individual features. Further GMM-UBM modeling gives reduced equal error rate (EER) as compared to GMM.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
The accuracy of the speaker identification system reduces severely in the presence of noise. The auditory based features called gammatone frequency ceptral coefficient (GFCC) which resembles to the ability of human ear to identify the speaker identity's in noisy environment. The GFCC is based on gammatone filter bank which models the basilar membrane as a sequence of overlapping band pass filters. The system uses GFCC features and Gaussian Mixture Model - Universal Background Model (GMM - UBM ) for modeling of features of the speaker. The experiment are conducted on the English Language Speech Database for Speaker Recognition (ELSDR) databases. In order to find out the performance of the system, the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results shows that the GFCC features has a good identification accuracy not only in clean sample, but also for noisy condition.
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnijcsa
In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion
of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker
influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for
experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in
training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech
database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise
and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a
significant impact on the accuracy of speaker recognition.
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
Voice is the best biometric feature for investigation and authentication. It has both biological and behavioural features. The acoustic features are related to the voice. The Speaker Recognition System is designed for the automatic authentication of speaker’s identity which is truly based on the human’s voice. Mel Frequency Cepstrum coefficient (MFCC) and Linear Prediction Cepstrum coefficient (LPCC) are taken in use for feature extraction from the provided voice sample. This paper provides a comparative study of MFCC and LPCC based on the accuracy of results and their working methodology. The results are better if MFCC is used for feature extraction.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
This paper presents an approach to the recognition of speech signal using frequency spectral information with Mel frequency for the improvement of speech feature representation in a HMM based recognition approach. A frequency spectral information is incorporated to the conventional Mel spectrum base speech recognition approach. The Mel frequency approach exploits the frequency observation for speech signal in a given resolution which results in resolution feature overlapping resulting in recognition limit. Resolution decomposition with separating frequency is mapping approach for a HMM based speech recognition system. The Simulation results show an improvement in the quality metrics of speech recognition with respect to computational time, learning accuracy for a speech recognition system.
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
Presence of noise increases the dimension of the information. A noise suppression algorithm is developed
with an idea of combining the General Kalman Filter and Estimate Maximization (EM) frame work.This
combination is helpful and effective in identifying noise characteristics of an acoustic environment.
Recursion between Estimate step and Maximization step enabled the algorithm to deal any model of noise.
The same Speech enhancement procedure in applied in the pre-processing stage of a conventional Speaker
identification method. Due to the non-stationary nature of noise and speech adaptive algorithms are
required. Algorithm is first applied for Speech enhancement problem and then extended to using it in the
pre-processing step of the Speaker identification. The present work is compared in terms of significant
metrics with existing and popular algorithms and results show that the developed algorithm is dominant
over them.
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
A novel text-independent speaker identification system based on the Zak transform is implemented. The data used in this paper are drawn from the ELSDSR database. The efficiency of identification approaches 91.3% using single test file and 100% using two test files. The method shows comparable efficiency results with the well known MFCC method with an advantage of being faster in both modeling and identification.
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
Comparative study to realize an automatic speaker recognition system IJECEIAES
In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Limited Data Speaker Verification: Fusion of FeaturesIJECEIAES
The present work demonstrates experimental evaluation of speaker verification for dif- ferent speech feature extraction techniques with the constraints of limited data (less than 15 seconds). The state-of-the-art speaker verification techniques provide good performance for sufficient data (greater than 1 minutes). It is a challenging task to develop techniques which perform well for speaker verification under limited data condition. In this work different features like Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Delta (4), Delta-Delta (44), Linear Prediction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are considered. The performance of individual features is studied and for better verification performance, combination of these features is attempted. A comparative study is made between Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) through experimental evaluation. The experiments are conducted using NIST-2003 database. The experimental results show that, the combination of features provides better performance compared to the individual features. Further GMM-UBM modeling gives reduced equal error rate (EER) as compared to GMM.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
The accuracy of the speaker identification system reduces severely in the presence of noise. The auditory based features called gammatone frequency ceptral coefficient (GFCC) which resembles to the ability of human ear to identify the speaker identity's in noisy environment. The GFCC is based on gammatone filter bank which models the basilar membrane as a sequence of overlapping band pass filters. The system uses GFCC features and Gaussian Mixture Model - Universal Background Model (GMM - UBM ) for modeling of features of the speaker. The experiment are conducted on the English Language Speech Database for Speaker Recognition (ELSDR) databases. In order to find out the performance of the system, the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results shows that the GFCC features has a good identification accuracy not only in clean sample, but also for noisy condition.
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnijcsa
In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion
of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker
influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for
experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in
training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech
database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise
and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a
significant impact on the accuracy of speaker recognition.
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
Voice is the best biometric feature for investigation and authentication. It has both biological and behavioural features. The acoustic features are related to the voice. The Speaker Recognition System is designed for the automatic authentication of speaker’s identity which is truly based on the human’s voice. Mel Frequency Cepstrum coefficient (MFCC) and Linear Prediction Cepstrum coefficient (LPCC) are taken in use for feature extraction from the provided voice sample. This paper provides a comparative study of MFCC and LPCC based on the accuracy of results and their working methodology. The results are better if MFCC is used for feature extraction.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
This paper presents an approach to the recognition of speech signal using frequency spectral information with Mel frequency for the improvement of speech feature representation in a HMM based recognition approach. A frequency spectral information is incorporated to the conventional Mel spectrum base speech recognition approach. The Mel frequency approach exploits the frequency observation for speech signal in a given resolution which results in resolution feature overlapping resulting in recognition limit. Resolution decomposition with separating frequency is mapping approach for a HMM based speech recognition system. The Simulation results show an improvement in the quality metrics of speech recognition with respect to computational time, learning accuracy for a speech recognition system.
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
Presence of noise increases the dimension of the information. A noise suppression algorithm is developed
with an idea of combining the General Kalman Filter and Estimate Maximization (EM) frame work.This
combination is helpful and effective in identifying noise characteristics of an acoustic environment.
Recursion between Estimate step and Maximization step enabled the algorithm to deal any model of noise.
The same Speech enhancement procedure in applied in the pre-processing stage of a conventional Speaker
identification method. Due to the non-stationary nature of noise and speech adaptive algorithms are
required. Algorithm is first applied for Speech enhancement problem and then extended to using it in the
pre-processing step of the Speaker identification. The present work is compared in terms of significant
metrics with existing and popular algorithms and results show that the developed algorithm is dominant
over them.
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
A novel text-independent speaker identification system based on the Zak transform is implemented. The data used in this paper are drawn from the ELSDSR database. The efficiency of identification approaches 91.3% using single test file and 100% using two test files. The method shows comparable efficiency results with the well known MFCC method with an advantage of being faster in both modeling and identification.
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
Comparative study to realize an automatic speaker recognition system IJECEIAES
In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
Speech is the most natural way of information exchange. It provides an efficient means of means of manmachine communication using speech interfacing. Speech interfacing involves speech synthesis and speech recognition. Speech recognition allows a computer to identify the words that a person speaks to a microphone or telephone. The two main components, normally used in speech recognition, are signal processing component at front-end and pattern matching component at back-end. In this paper, a setup that uses Mel frequency cepstral coefficients at front-end and artificial neural networks at back-end has been developed to perform the experiments for analyzing the speech recognition performance. Various experiments have been performed by varying the number of layers and type of network transfer function, which helps in deciding the network architecture to be used for acoustic modelling at back end.
The performance of speaker identification systems
decreases significantly under noisy conditions and especially
when there is a difference between the recognition and the
learning sessions. To improve robustness, we have proposed in
the previous study an auditory features and a robust speaker
recognition system using a front-end based on the combination of
MFCC and RASTA-PLP methods. In this paper, we further
study the auditory features by exploring the combination of
GFCC and RASTA-PLP. We find that the method performs
substantially better than all previous studied methods.
Furthermore, our current identification system achieves
significant performance improvements of 5.92% in a wide range
of signal-to-noise conditions compared with the last studied
front-end based on MFCC combined to RASTA-PLP.
Experimental results show an average accuracy improvement of
10.11% in case of GFCC combined with RASTA-PLP over the
base line MFCC thechnique across various SNR. This fusion
approach allow a highly and appreciable enhancement in the
higher noisy conditions.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcscpconf
Emotional state recognition through speech is being a very interesting research topic nowadays.
Using subliminal information of speech, it is possible to recognize the emotional state of the
person. One of the main problems in the design of automatic emotion recognition systems is the
small number of available patterns. This fact makes the learning process more difficult, due to
the generalization problems that arise under these conditions.
In this work we propose a solution to this problem consisting in enlarging the training set
through the creation the new virtual patterns. In the case of emotional speech, most of the
emotional information is included in speed and pitch variations. So, a change in the average
pitch that does not modify neither the speed nor the pitch variations does not affect the
expressed emotion. Thus, we use this prior information in order to create new patterns applying
a pitch shift modification in the feature extraction process of the classification system. For this
purpose, we propose a frequency scaling modification of the Mel Frequency Cepstral
Coefficients, used to classify the emotion. This proposed process allows us to synthetically
increase the number of available patterns in thetraining set, thus increasing the generalization
capability of the system and reducing the test error.
Low complexity DCO-FBMC visible light communication system IJECEIAES
Filter Bank Multicarrier (FBMC) is a new waveform candidate in the visible light communication system (VLC). FBMC is a distinctive kind of multi-carrier modulation that can be regarded as an alternative to orthogonal frequency Division Multicarrier (OFDM) with CP (cyclic prefix). DCOFBMC (DC-bias optical FBMC) has recently been used in VLC, because it overcomes all defects in the optical-OFDM system and has high spectral efficiency. But at the same time the traditional DCO-FBMC suffers from high complexity due to the use of Hermitian Symmetry for real signal, by using 2Npoint subcarrier IFFT (Inverse Fast Fourier Transformer) in the modulator, and the output is N-point subcarrier FFT (Fast Fourier Transform) in the demodulator. In this paper, for the first time, the possibility of minimizing complexity and generating a real signal without the use of Hermitian Symmetry or any other technique has been verified. The proposed technology provides 50% of the size of the IFFT / FFT and this results in a significant reduction in power consumption and occupied chip area.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
Emotional state recognition through speech is being a very interesting research topic nowadays.
Using subliminal information of speech, it is possible to recognize the emotional state of the
person. One of the main problems in the design of automatic emotion recognition systems is the
small number of available patterns. This fact makes the learning process more difficult, due to
the generalization problems that arise under these conditions.
In this work we propose a solution to this problem consisting in enlarging the training set
through the creation the new virtual patterns. In the case of emotional speech, most of the
emotional information is included in speed and pitch variations. So, a change in the average
pitch that does not modify neither the speed nor the pitch variations does not affect the
expressed emotion. Thus, we use this prior information in order to create new patterns applying
a pitch shift modification in the feature extraction process of the classification system. For this
purpose, we propose a frequency scaling modification of the Mel Frequency Cepstral
Coefficients, used to classify the emotion. This proposed process allows us to synthetically
increase the number of available patterns in thetraining set, thus increasing the generalization
capability of the system and reducing the test error.
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
This paper presents compressive sensing technique used for speech reconstruction using linear predictive coding because the
speech is more sparse in LPC. DCT of a speech is taken and the DCT points of sparse speech are thrown away arbitrarily.
This is achieved by making some point in DCT domain to be zero by multiplying with mask functions. From the incomplete
points in DCT domain, the original speech is reconstructed using compressive sensing and the tool used is Gradient
Projection for Sparse Reconstruction. The performance of the result is compared with direct IDCT subjectively. The
experiment is done and it is observed that the performance is better for compressive sensing than the DCT.
We applied various architectures of deep neural networks for sound event detection and compared their performance using two different datasets. Feed forward neural network (FNN), convolutional neural network (CNN), recurrent neural network (RNN) and convolutional recurrent neural network (CRNN) were implemented using hyper-parameters optimized for each architecture and dataset. The results show that the performance of deep neural networks varied significantly depending on the learning rate, which can be optimized by conducting a series of experiments on the validation data over predetermined ranges. Among the implemented architectures, the CRNN performed best under all testing conditions, followed by CNN. Although RNN was effective in tracking the time-correlation information in audio signals,it exhibited inferior performance compared to the CNN and the CRNN. Accordingly, it is necessary to develop more optimization strategies for implementing RNN in sound event detection.
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the product’s manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
In this paper, snake optimization algorithm (SOA) is used to find the optimal gains of an enhanced controller for controlling congestion problem in computer networks. M-file and Simulink platform is adopted to evaluate the response of the active queue management (AQM) system, a comparison with two classical controllers is done, all tuned gains of controllers are obtained using SOA method and the fitness function chose to monitor the system performance is the integral time absolute error (ITAE). Transient analysis and robust analysis is used to show the proposed controller performance, two robustness tests are applied to the AQM system, one is done by varying the size of queue value in different period and the other test is done by changing the number of transmission control protocol (TCP) sessions with a value of ± 20% from its original value. The simulation results reflect a stable and robust behavior and best performance is appeared clearly to achieve the desired queue size without any noise or any transmission problems.
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
Vehicular ad-hoc networks (VANETs) are wireless-equipped vehicles that form networks along the road. The security of this network has been a major challenge. The identity-based cryptosystem (IBC) previously used to secure the networks suffers from membership authentication security features. This paper focuses on improving the detection of intruders in VANETs with a modified identity-based cryptosystem (MIBC). The MIBC is developed using a non-singular elliptic curve with Lagrange interpolation. The public key of vehicles and roadside units on the network are derived from number plates and location identification numbers, respectively. Pseudo-identities are used to mask the real identity of users to preserve their privacy. The membership authentication mechanism ensures that only valid and authenticated members of the network are allowed to join the network. The performance of the MIBC is evaluated using intrusion detection ratio (IDR) and computation time (CT) and then validated with the existing IBC. The result obtained shows that the MIBC recorded an IDR of 99.3% against 94.3% obtained for the existing identity-based cryptosystem (EIBC) for 140 unregistered vehicles attempting to intrude on the network. The MIBC shows lower CT values of 1.17 ms against 1.70 ms for EIBC. The MIBC can be used to improve the security of VANETs.
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
Understanding the primary factors of internet banking (IB) acceptance is critical for both banks and users; nevertheless, our knowledge of the role of users’ perceived risk and trust in IB adoption is limited. As a result, we develop a conceptual model by incorporating perceived risk and trust into the technology acceptance model (TAM) theory toward the IB. The proper research emphasized that the most essential component in explaining IB adoption behavior is behavioral intention to use IB adoption. TAM is helpful for figuring out how elements that affect IB adoption are connected to one another. According to previous literature on IB and the use of such technology in Iraq, one has to choose a theoretical foundation that may justify the acceptance of IB from the customer’s perspective. The conceptual model was therefore constructed using the TAM as a foundation. Furthermore, perceived risk and trust were added to the TAM dimensions as external factors. The key objective of this work was to extend the TAM to construct a conceptual model for IB adoption and to get sufficient theoretical support from the existing literature for the essential elements and their relationships in order to unearth new insights about factors responsible for IB adoption.
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
The smart antennas are broadly used in wireless communication. The least mean square (LMS) algorithm is a procedure that is concerned in controlling the smart antenna pattern to accommodate specified requirements such as steering the beam toward the desired signal, in addition to placing the deep nulls in the direction of unwanted signals. The conventional LMS (C-LMS) has some drawbacks like slow convergence speed besides high steady state fluctuation error. To overcome these shortcomings, the present paper adopts an adaptive fuzzy control step size least mean square (FC-LMS) algorithm to adjust its step size. Computer simulation outcomes illustrate that the given model has fast convergence rate as well as low mean square error steady state.
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
This paper presents the design and implementation of a forest fire monitoring and warning system based on long range (LoRa) technology, a novel ultra-low power consumption and long-range wireless communication technology for remote sensing applications. The proposed system includes a wireless sensor network that records environmental parameters such as temperature, humidity, wind speed, and carbon dioxide (CO2) concentration in the air, as well as taking infrared photos.The data collected at each sensor node will be transmitted to the gateway via LoRa wireless transmission. Data will be collected, processed, and uploaded to a cloud database at the gateway. An Android smartphone application that allows anyone to easily view the recorded data has been developed. When a fire is detected, the system will sound a siren and send a warning message to the responsible personnel, instructing them to take appropriate action. Experiments in Tram Chim Park, Vietnam, have been conducted to verify and evaluate the operation of the system.
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
Cognitive radio is a smart radio that can change its transmitter parameter based on interaction with the environment in which it operates. The demand for frequency spectrum is growing due to a big data issue as many Internet of Things (IoT) devices are in the network. Based on previous research, most frequency spectrum was used, but some spectrums were not used, called spectrum hole. Energy detection is one of the spectrum sensing methods that has been frequently used since it is easy to use and does not require license users to have any prior signal understanding. But this technique is incapable of detecting at low signal-to-noise ratio (SNR) levels. Therefore, the wavelet-based sensing is proposed to overcome this issue and detect spectrum holes. The main objective of this work is to evaluate the performance of wavelet-based sensing and compare it with the energy detection technique. The findings show that the percentage of detection in wavelet-based sensing is 83% higher than energy detection performance. This result indicates that the wavelet-based sensing has higher precision in detection and the interference towards primary user can be decreased.
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
In this paper, we present the design of a new wide dual-band bandstop filter (DBBSF) using nonuniform transmission lines. The method used to design this filter is to replace conventional uniform transmission lines with nonuniform lines governed by a truncated Fourier series. Based on how impedances are profiled in the proposed DBBSF structure, the fractional bandwidths of the two 10 dB-down rejection bands are widened to 39.72% and 52.63%, respectively, and the physical size has been reduced compared to that of the filter with the uniform transmission lines. The results of the electromagnetic (EM) simulation support the obtained analytical response and show an improved frequency behavior.
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
A distributed denial of service (DDoS) attack is where one or more computers attack or target a server computer, by flooding internet traffic to the server. As a result, the server cannot be accessed by legitimate users. A result of this attack causes enormous losses for a company because it can reduce the level of user trust, and reduce the company’s reputation to lose customers due to downtime. One of the services at the application layer that can be accessed by users is a web-based lightweight directory access protocol (LDAP) service that can provide safe and easy services to access directory applications. We used a deep learning approach to detect DDoS attacks on the CICDDoS 2019 dataset on a complex computer network at the application layer to get fast and accurate results for dealing with unbalanced data. Based on the results obtained, it is observed that DDoS attack detection using a deep learning approach on imbalanced data performs better when implemented using synthetic minority oversampling technique (SMOTE) method for binary classes. On the other hand, the proposed deep learning approach performs better for detecting DDoS attacks in multiclass when implemented using the adaptive synthetic (ADASYN) method.
The appearance of uncertainties and disturbances often effects the characteristics of either linear or nonlinear systems. Plus, the stabilization process may be deteriorated thus incurring a catastrophic effect to the system performance. As such, this manuscript addresses the concept of matching condition for the systems that are suffering from miss-match uncertainties and exogeneous disturbances. The perturbation towards the system at hand is assumed to be known and unbounded. To reach this outcome, uncertainties and their classifications are reviewed thoroughly. The structural matching condition is proposed and tabulated in the proposition 1. Two types of mathematical expressions are presented to distinguish the system with matched uncertainty and the system with miss-matched uncertainty. Lastly, two-dimensional numerical expressions are provided to practice the proposed proposition. The outcome shows that matching condition has the ability to change the system to a design-friendly model for asymptotic stabilization.
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
Many systems, including digital signal processors, finite impulse response (FIR) filters, application-specific integrated circuits, and microprocessors, use multipliers. The demand for low power multipliers is gradually rising day by day in the current technological trend. In this study, we describe a 4×4 Wallace multiplier based on a carry select adder (CSA) that uses less power and has a better power delay product than existing multipliers. HSPICE tool at 16 nm technology is used to simulate the results. In comparison to the traditional CSA-based multiplier, which has a power consumption of 1.7 µW and power delay product (PDP) of 57.3 fJ, the results demonstrate that the Wallace multiplier design employing CSA with first zero finding logic (FZF) logic has the lowest power consumption of 1.4 µW and PDP of 27.5 fJ.
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
The flaw in 5G orthogonal frequency division multiplexing (OFDM) becomes apparent in high-speed situations. Because the doppler effect causes frequency shifts, the orthogonality of OFDM subcarriers is broken, lowering both their bit error rate (BER) and throughput output. As part of this research, we use a novel design that combines massive multiple input multiple output (MIMO) and weighted overlap and add (WOLA) to improve the performance of 5G systems. To determine which design is superior, throughput and BER are calculated for both the proposed design and OFDM. The results of the improved system show a massive improvement in performance ver the conventional system and significant improvements with massive MIMO, including the best throughput and BER. When compared to conventional systems, the improved system has a throughput that is around 22% higher and the best performance in terms of BER, but it still has around 25% less error than OFDM.
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
In this study, it is aimed to obtain two different asymmetric radiation patterns obtained from antennas in the shape of the cross-section of a parabolic reflector (fan blade type antennas) and antennas with cosecant-square radiation characteristics at two different frequencies from a single antenna. For this purpose, firstly, a fan blade type antenna design will be made, and then the reflective surface of this antenna will be completed to the shape of the reflective surface of the antenna with the cosecant-square radiation characteristic with the frequency selective surface designed to provide the characteristics suitable for the purpose. The frequency selective surface designed and it provides the perfect transmission as possible at 4 GHz operating frequency, while it will act as a band-quenching filter for electromagnetic waves at 5 GHz operating frequency and will be a reflective surface. Thanks to this frequency selective surface to be used as a reflective surface in the antenna, a fan blade type radiation characteristic at 4 GHz operating frequency will be obtained, while a cosecant-square radiation characteristic at 5 GHz operating frequency will be obtained.
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
A simple and low-cost fiber based optical sensor for iron detection is demonstrated in this paper. The sensor head consist of an unclad optical fiber with the unclad length of 1 cm and it has a straight structure. Results obtained shows a linear relationship between the output light intensity and iron concentration, illustrating the functionality of this iron optical sensor. Based on the experimental results, the sensitivity and linearity are achieved at 0.0328/ppm and 0.9824 respectively at the wavelength of 690 nm. With the same wavelength, other performance parameters are also studied. Resolution and limit of detection (LOD) are found to be 0.3049 ppm and 0.0755 ppm correspondingly. This iron sensor is advantageous in that it does not require any reagent for detection, enabling it to be simpler and cost-effective in the implementation of the iron sensing.
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
This article discusses the progressive learning for structural tolerance online sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save robust accuracy while updating the new data and the new class data on the online training situation. The robustness accuracy arises from using the householder block exact QR decomposition recursive least squares (HBQRD-RLS) of the PSTOS-ELM. This method is suitable for applications that have data streaming and often have new class data. Our experiment compares the PSTOS-ELM accuracy and accuracy robustness while data is updating with the batch-extreme learning machine (ELM) and structural tolerance online sequential extreme learning machine (STOS-ELM) that both must retrain the data in a new class data case. The experimental results show that PSTOS-ELM has accuracy and robustness comparable to ELM and STOS-ELM while also can update new class data immediately.
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
This study aimed to develop a brain-computer interface that can control an electric wheelchair using electroencephalography (EEG) signals. First, we used the Mind Wave Mobile 2 device to capture raw EEG signals from the surface of the scalp. The signals were transformed into the frequency domain using fast Fourier transform (FFT) and filtered to monitor changes in attention and relaxation. Next, we performed time and frequency domain analyses to identify features for five eye gestures: opened, closed, blink per second, double blink, and lookup. The base state was the opened-eyes gesture, and we compared the features of the remaining four action gestures to the base state to identify potential gestures. We then built a multilayer neural network to classify these features into five signals that control the wheelchair’s movement. Finally, we designed an experimental wheelchair system to test the effectiveness of the proposed approach. The results demonstrate that the EEG classification was highly accurate and computationally efficient. Moreover, the average performance of the brain-controlled wheelchair system was over 75% across different individuals, which suggests the feasibility of this approach.
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
For image segmentation, level set models are frequently employed. It offer best solution to overcome the main limitations of deformable parametric models. However, the challenge when applying those models in medical images stills deal with removing blurs in image edges which directly affects the edge indicator function, leads to not adaptively segmenting images and causes a wrong analysis of pathologies wich prevents to conclude a correct diagnosis. To overcome such issues, an effective process is suggested by simultaneously modelling and solving systems’ two-dimensional partial differential equations (PDE). The first PDE equation allows restoration using Euler’s equation similar to an anisotropic smoothing based on a regularized Perona and Malik filter that eliminates noise while preserving edge information in accordance with detected contours in the second equation that segments the image based on the first equation solutions. This approach allows developing a new algorithm which overcome the studied model drawbacks. Results of the proposed method give clear segments that can be applied to any application. Experiments on many medical images in particular blurry images with high information losses, demonstrate that the developed approach produces superior segmentation results in terms of quantity and quality compared to other models already presented in previeous works.
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
Drug addiction is a complex neurobiological disorder that necessitates comprehensive treatment of both the body and mind. It is categorized as a brain disorder due to its impact on the brain. Various methods such as electroencephalography (EEG), functional magnetic resonance imaging (FMRI), and magnetoencephalography (MEG) can capture brain activities and structures. EEG signals provide valuable insights into neurological disorders, including drug addiction. Accurate classification of drug addiction from EEG signals relies on appropriate features and channel selection. Choosing the right EEG channels is essential to reduce computational costs and mitigate the risk of overfitting associated with using all available channels. To address the challenge of optimal channel selection in addiction detection from EEG signals, this work employs the shuffled frog leaping algorithm (SFLA). SFLA facilitates the selection of appropriate channels, leading to improved accuracy. Wavelet features extracted from the selected input channel signals are then analyzed using various machine learning classifiers to detect addiction. Experimental results indicate that after selecting features from the appropriate channels, classification accuracy significantly increased across all classifiers. Particularly, the multi-layer perceptron (MLP) classifier combined with SFLA demonstrated a remarkable accuracy improvement of 15.78% while reducing time complexity.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
2. TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1915
the speech production with the approaches of HMM and MFCC [27]. Other SR was with hidden
Marcov model that is the experiment conducted by Farsi [28], the result showed that the HMM
process needed to still be done with GA (Genetic Algorithm) for a good result. To compare [27],
another study on MFCC conducted by Mohan [29] was done by combining MFCC with DTW
algorithm showed the good results in the process of SR with ten features for each word in
training phase.
Based on the comparison from a number of studies above (MFCC, LPC, PLP, GFCC)
to this study, MFCC was used for the process of feature extraction on the speech signal as
commonly it had the high performance rate and low complexity [30]. In this research,
a prototype of SR system was made to FPGA that later would be used as an input for
the robotic car (further research). One of examples of word recognition system application can
be done in a simple robot car. The robot is able to differentiate the word pronounced by
the users. By applying the word recognition system in the robotic car, then the users could
control the direction of robot motion without a need to touch the button or being close to
the robot.
2. Mel Frequency Cepstrum Coefficients (MFCC)
Feature extraction is a process to determine a value or vector that can be used as an
object or individual identifier that subsequently will be used in the classification
process [31, 32]. MFCC analysis is a standard method [33] used to represent the parameter of
sound signal. The mechanism of MFCC is based upon the difference of frequency that can be
captured by human ears commonly stated in the scale of Mel (originated from Melody) in which
the sound signal would be filtered in linear in Mel frequency scale that is for the low frequency
less than 1 KHz and logarithmically for high frequency more than 1 KHz [34]. The block diagram
for the process of feature extraction using MFCC is presented in Figure 1.
Figure 1. MFCC block diagram
As shown in Figure1 MFCC consists of some following computational steps:
Step 1: Preprocessing
As the initial step, the analog signal is passed through HPF emphasizes or known as
pre-emphasis [35] Its purpose is to increase the energy of signal [36] thus, its output becomes
the one as in (1) :
Y(𝑛) = 𝑥(𝑛)−∝ 𝑥(𝑛 − 1) (1)
𝑤ℎ𝑒𝑟𝑒 0.9 ≤ ∝ ≤ 1
3. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1916
Step 2: Frame Blocking and Windowing
Speech signal enters to the process of short frame or frame blocking with the duration
of 10-30 ms on purpose for ADC. However, in its process, aliasing or spectral leakage or
discontinue frequently occurs. For this, to cope with this problem, it must firstly start with
windowing process before the signal continues to FFT phase. If defined, the window as function
of 𝑤(𝑛), 0 ≤ 𝑛 ≤ (𝑁 − 1) is dependent upon the N value in which N refers to the number of
samples in its frame. The input signal enters to the windowing process with the convolutional
concept. The function of window used in this study is Hamming windowing as shown
in (2) [37, 38]:
𝑤(𝑛) = 0.54 − 0.46 𝑐𝑜𝑠 (
2𝜋𝑛
𝑁−1
) (2)
where 0 ≤ 𝑛 ≤ (𝑁 − 1).
The determination of the number of frame length should be in the fold of 2N to facilitate the FFT
process existing in the next block.
Step 3: DFT (Discrete Fourier Transform)
The output signal from Hamming window is in the time domain. To facilitate
the measurement in the further process Mel-filter bank, then the signal is transformed
mathematically from discrete time domain to the frequency using DFT method [39]. Meanwhile,
the algorithm used to do transformation is called as FFT. Mathematically, DFT can be
formulated as follows (3) [40, 41]:
𝑋[𝑘] = ∑ 𝑥[𝑛]. 𝑒−
𝑗2𝜋𝑛𝑘
𝑁𝑁−1
𝑛=0 (2)
𝑘 = 0,1,2, … . , (𝑁 − 1)
By doing FFT process, then it can obtain the value of X[k] as the result of
the transformation from FFT representing each value of x(n) from the input signal. From
the input signal in which each of the values is the representation of basic frequency from
the input signal. X[k] is commonly called as spectrum or periodogram.
Step 4: Mel Frequency Filter Bank
This phase is the filtering process from frequency spectrum of X[k] in each frame using
a number of M filter banks. This filter is made by following the perception of mel frequency scale
represented to be the function of triangle filter function and mel scale frequency is obtained from
the result of the conversion of linear frequency. For the linear frequency (fHz)<1 kHz, it is
converted to be fHz while if fHz>1 kHz, then it is converted into the (4) presented as
follows [29], [35], [42] :
𝑀𝑒𝑙(𝑓) = 2595𝑙𝑜𝑔10 (1 +
𝑓
700
) (3)
warping process to the signal in the frequency domain will result in the value of Mel Spectrum
coefficients through the process as shown in (5) as follows:
𝑋𝑖 = 𝑙𝑜𝑔10(∑ |𝑋(𝑘)|𝐻𝑖(𝑘)𝑁−1
𝑘=0 ) (4)
Xi is the value of frequency spectrum to i, N is the number of coefficients of FFT, and Hi(f) is
the filter value to i on the frequency spot f.
Step 5: Cepstrum
At this phase, Mel Cepstrum would be converted into the time domain using Discrete
Cosine Transform (DCT). The result is called as Mel Frequency Cepstrum Coefficients (MFCC)
as shown in (6) [41, 42] in which n is the number of coefficient and M is the number of filter
banks. The word cepstrum is originated from the word spectrum that is reversed in its first
syllable that is spec to ceps [43]. Cepstrum is the power spectrum obtained mathematically
through the logarithmic computation [44, 45].
4. TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1917
𝐶 𝑛 = ∑ ( 𝑙𝑜𝑔 𝑋𝑖) [𝑛 (𝑀 −
1
2
)
𝜋
𝑀
]𝑀
𝑖=1 (5)
3. Digital Design and Speech Recognition Simulation
This section discusses in detailed about the design of the block of speech recognition
(SR) system. The main part consists of audio codec interface, feature extraction using MFCC
and classification using Euclidean distance. It also discusses about the simulation of the digital
logic design to FPGA using the Xilinx application. In the beginning of condition, the system will
wait for the input from the switch to do training or recognizing. In the training mode, the coming
sound will face the process of feature extraction using MFCC. The result of the feature
extraction will be in the form of cepstral coefficient stored in the database. In the recognizing
mode, the sound coming to the system will face the feature extraction process.
Then, the coefficient cepstral of input sound would be compared the cepstral coefficient in
the database using Euclidean distance method. The lowest value of Euclidean distance would
be classified with the corresponding words. Further, the logic on the output pin would be
conditioned in accordance with the words recognized. The value of output pin of this FPGA is
used as the logic input on the driver motor to control the motion of the robotic car as shown in
Table 1 as follows.
Table 1. Logic Output FPGA
Instruction First Motor Second Motor Output
Logic of FPGAEn Dir 1 Dir 2 En Dir 1 Dir 2
“Right” 1 1 1 1 1 1 111111
“Left” 1 0 1 1 1 0 101110
“Forward” 0 0 0 1 1 0 000110
“Backward” 0 0 0 1 0 1 000101
“Stop” 0 0 0 0 0 0 000000
The result of the simulation output then is compared to the result of the MATLAB.
The system architecture of the speech recognition block is shown in Figure 2. The system
architecture consists of four main blocks: microphone, audio codec, FPGA, driver motor and
motor DC. Inside the FPGA, it is added with eight digital series acting as the cores of the system
those are codec interface, preprocessing, control unit, clock divider, MFCC, database,
Euclidean distance and output logic.
Figure 2. Architecture of proposed speech recognition
4. Results and Analysis
Simulation of the logic series design was conducted in Xilinx ISE Project Navigator.
Also, computation simulation was done in MATLAB in order to obtain the comparing data from
the designed system. The test was conducted by observing and comparing the data in each
5. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1918
sub-system of speech recognition. The synthesis of the logic circuit in this proposed system can
be seen in Figure 3.
Figure 3. RTL Design schematic
4.1. Pre-emphasis Filter
This series functions to reduce the noise ratio in signal (SNR). This filter maintains
the high frequencies on the spectrum eliminated in the process of sound production. From
the result of simulation test compared by means of manual calculation, it has been found a
similar value in each sample of signal. The results are shown in Table 2.
Table 2. Comparison of the Result of Manual Calculation on the Pre-emphasis Filter and
Simulation Result
No Manual Calculation Xilinx Simulation
1 380 380
2 -166.25 -167
3 282.875 282
4 -25.1875 -26
5 -436.563 -437
6 431.5625 431
7 23.75 23
8 -519.25 -520
9 -92.1875 -93
10 -42.3125 -43
4.2. FFT
The computation FFT design on FPGA is done separately to result in 2 parts of output.
From Figure 4 below can be seen that the result was not much different as the calculation
operation in this design did not use the floating point system. The comparison of the results of
FFT in the graphic form can be seen in Figure 4.
4.3. Mel Frequency Warping
Mel Frequency Warping functions as a filter from the spectrum of frequency of
the output result of FFT. The multiplication process was done in parallel to 20 filter banks to
make this process faster. The results from 20 filter bank are in the form of magnitude values as
6. TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1919
shown in Figure 5. As seen in Figure 5 above, it can be seen that the feature in 5 - 10 from
the result of Xilink simulation approached the MATLAB computation. This shows that the logic
design made is precise.
Figure 4. The Graph of the calculation result on 256-point FFT on the MATLAB
and Xilinx Simulation
Figure 5. Graph of Mel Frequency Warping result on MATLAB and Simulation on Xilinx
4.4. Cepstrum
This block functions to convert the Mel cepstrum from the result of previous block into
the time domain using Discrete Cosine Transform. The coefficient value was then changed into
the representation of fixed point 16 bits and stored in ROM. This block also had RAM to store
the output from the previous process to facilitate the process. The results of the block, if
compared to the result of the calculation on MATLAB as shown in Figure 6, were seen different.
This was because the calculation operation in this block involved the numbers that had some
digits after the comma. Meanwhile, the representation of the numbers used did not have any
accuracy in number; thus, rounding occurred to the representation of numbers closer.
4.5. Decision
12 coefficients from the result of feature extraction stored in the database at the testing
phase will be compared to the coefficient of the sound input feature and then will be cut to do
1 2 3 4 5 6 7 8 9 10
Number of Features
2
2.5
3
3.5
4
Magnitude
Matlab vs Xilink Simulation
Matlab vs Xilink Simulation
7. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1920
the instruction as determined. The decision was taken by calculating the closest feature
coefficient value using euclidean distance. The block of database made was used to store one
sample of each instruction word. Five dataset of feature were saved in the database.
Once obtaining the closest number, then decision was taken to regulate the control in the motor
in the format of 4 bit data as shown in the following Figure 7.
Figure 6. Graph of on the Calculation Result on MATLAB and Xilinx Simulation
Figure 7. Result of the simulation on the output logic block on Xilinx
5. Conclusion
This research has successfully made a design and simulation of logic series in a
speech recognition system using the MFCC method and euclidean distance to control the rate
of robotic car. The MFCC method was used to obtain the feature from the command input in
the form of sound consisting of right, left, forward, backward and stop. The result of the signal
feature from MFCC furthermore was calculated for its similarity and compared with the signal
feature in the database using euclidean distance to give the control logic in motor.
Process simplification was also done to obtain the resource of the FPGA memory as
minimum as possible but still had a good performance. Validation has been conducted to test
the excellence of the system made. This test was done by comparing the output value of logic
design that has been made in each part of speech recognition componentswith the calculation
simulation on MATLAB. There was a difference in the value between the result of
the computation of the logic series and the MATLAB as the calculating operation involved
the numbers that had some digits after the comma. Meanwhile, the representation of
the numbers used did not have any sufficient number accuracy; thus making the rounding
occurred in the closest representative numbers. However, this difference value was relative
insignificant and the system then still had a good performance to compare the signal features
from one class to other as proven in the results. The next research will be done through
the implementation to the FPGA board and analysis on the synthesis of logic series to
determine the parameter of the performance in IC design.
1 2 3 4 5 6 7 8 9 10 11 12
Number of Features
-10
-6
-2
2
6
10
Magnitude
Matlab Computation vs Xilink Simulation
Matlab
Xilink
8. TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1921
References
[1] M Vacher, AFleury, F Portet, J-F Serignat, N Noury. Complete Sound and Speech Recognition
System for Health Smart Homes: Application to the Recognition of Activities of Daily Living. New
Developments in Biomedical Engineering. InTech. 2010.
[2] P Putthapipat, C Woralert, P Sirinimnuankul. Speech recognition gateway for home automation on
open platform. International Conference on Electronics, Information, and Communication (ICEIC)
2018. Hawai, USA. 2018: 1–4.
[3] T Ayres, B Nolan. Voice activated command and control with speech recognition over WiFi. Sci.
Comput. Program. 2006; 59(1–2): 109–126.
[4] A Mohanta, VK Mittal. Human Emotional States Classification Based upon Changes in Speech
Production Features in Vowel Region. International Conference on Telecommunication and Networks
(TEL-NET) 2017. Noida, India. 2017.
[5] KF Akingbade, OM Umanna, IA Alimi. Voice-Based Door Access Control System Using the Mel
Frequency Cepstrum Coefficients and Gaussian Mixture Model. Int. J. Electr. Comput. Eng. 2014;
4(5): 643–647.
[6] W Rao et al. Investigation of fixed-dimensional speech representations for real-time speech emotion
recognition system. International Conference on Orange Technologies (ICOT) 2017: 197–200.
[7] J Yadav, MS Fahad, KS Rao. Epoch detection from emotional speeach signal using zero time
windowing. Speech Commun. 2017; 96(November): 142–149.
[8] HK Palo, MN Mohanty. Classification of Emotional Speech of Children Using Probabilistic Neural
Network. Int. J. Electr. Comput. Eng. 2015; 5(2): 311–317.
[9] FE Gunawan, K Idananta. Predicting the Level of Emotion by Means of Indonesian Speech Signal.
TELKOMNIKA. 2017; 15(2): 665–670.
[10] J Twiefel, X Hinaut, S Wermter, J Twiefel, X Hinaut, S. Wermter. Syntactic Reanalysis in Language
Models for Speech Recognition. Int. Conf. Dev. Learn. Epigenetic Robot. Lisbon, Portugal. 2017:
215–220.
[11] D Recasens, C Rodríguez. Contextual and syllabic effects in heterosyllabic consonant sequences.
An ultrasound study. Speech Commun. 2017; 96(December): 150–167.
[12] M Jia, J Sun, C Bao, C Ritz. Separation of multiple speech sources by recovering sparse and
non-sparse components from B-format microphone recordings. Speech Commun. 2017; 96(May):
184–196.
[13] M Tahon, G Lecorve, D Lolive. Can we Generate Emotional Pronunciations for Expressive Speech
Synthesis?. IEEE Trans. Affect. Comput. 2018; 3045: 1-1.
[14] K Li, S Mao, X Li, Z Wu, H Meng. Automatic lexical stress and pitch accent detection for L2 English
speech using multi-distribution deep neural networks. Speech Commun. 2017; 96(September):
28–36.
[15] RK Kandagatla, PV. Subbaiah. Speech enhancement using MMSE estimation of amplitude and
complex speech spectral coefficients under phase-uncertainty. Speech Commun. 2017;
96(November): 10–27.
[16] A Misra, JHL. Hansen. Modelling and compensation for language mismatch in speaker verification.
Speech Commun. 2017; 96(January): 58–66.
[17] B Wiem, BM Mohamed anouar, P Mowlaee, B Aicha. Unsupervised single channel speech
separation based on optimized subspace separation. Speech Commun. 2017; 96 (April): 93–101.
[18] R Soleymani, IW. Selesnick, DM. Landsberger. SEDA: A tunable Q-factor wavelet-based noise
reduction algorithm for multi-talker babble. Speech Commun. 2017; 96(October): 102–115.
[19] Y Tang, Q Liu, W Wang, TJ Cox. A non-intrusive method for estimating binaural speech intelligibility
from noise-corrupted signals captured by a pair of microphones. Speech Commun. 2017; 96(June):
116–128.
[20] F Saki, A Bhattacharya, N Kehtarnavaz. Real-time Simulink implementation of noise adaptive speech
processing pipeline of cochlear implants. Speech Commun. 2017; 96(April): 197–206.
[21] S Naragan, MD Gupta. Speech Feature Extraction Technique: A Review. Int. J. Comput. Sci. Mob.
Comput. 2015; 4(3): 107–114.
[22] N Dave. Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition. Int. J. Adv. Res.
Eng. Technol. 2013; 1(VI): 1–5.
[23] Chulhee Lee, Donghoon Hyun, Euisun Choi, Jinwook Go, and Chungyong Lee. Optimizing feature
extraction for speech recognition. IEEE Trans. Speech Audio Process. 2003; 11(1): 80–87.
[24] C Vimala and V. Radha. Suitable Feature Extraction and Speech Recognition Technique for Isolated
Tamil Spoken Words. Int. J. Comput. Sci. Inf. Technol. 2014; 5(1): 378–383.
[25] M Gurban, JP Thiran. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition.
IEEE Trans. Signal Process. 2009; 57(12): 4765–4776.
[26] H Gupta, D Gupta. LPC and LPCC method of feature extraction in Speech Recognition System.
Proc. 6th Int. Conf. Cloud Syst. Big Data Eng. Conflu. (2016). Noida, India: 498–502.
9. ◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1922
[27] HJM Steeneken, JHL Hansen. Speech under stress conditions: overview of the effect on speech
production and on system performance. IEEE International Conference on Acoustics, Speech, and
Signal Processing. Phoenix, USA. 1999: 2079–2082.
[28] H Farsi, R Saleh. Implementation and optimization of a speech recognition system based on hidden
Markov model using genetic algorithm. Iranian Conference on Intelligent Systems (ICIS). Bam, Iran.
2014: 1–5.
[29] Bhadragiri Jagan Mohan and Ramesh Babu N. Speech recognition using MFCC and DTW.
International Conference on Advances in Electrical Engineering (ICAEE). Vellore, India. 2014: 1–4.
[30] I McLoughlin. Applied Speech and Audio Processing. Cambridge: Cambridge University Press. 2009;
9780521519.
[31] J Yang, J Yang. Generalized K–L transform based combined feature extraction. Pattern Recognit.
2002; 35(1): 295–297.
[32] I Guyon, A Elisseeff. An Introduction to Feature Extraction. Feature Extraction. 2006; 207: 1–25.
[33] P Motlicek. Feature Extraction in Speech Coding and Recognition. Report of PhD research internship
in ASP Group. 2003: 1–50.
[34] GSVS Sivaram, H Hermansky. Sparse Multilayer Perceptron for Phoneme Recognition. IEEE Trans.
Audio. Speech. Lang. Processing. 2012; 20(1): 23–29.
[35] L Muda, M Begam, I Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. J. Comput. 2010; 2(3): 138–143.
[36] A Přibilová. Preemphasis influence on harmonic speech model with autoregressive parameterization.
Radioengineering. 2003; 12(3): 32–36.
[37] R Goldberg, L Riek. A practical handbook of speech coders, 1st ed. Boca Raton: CRC Press, 2000.
[38] MA Samad. A Novel Window Function Yielding Suppressed Mainlobe Width and Minimum Sidelobe
Peak. Int. J. Comput. Sci. Eng. Inf. Technol. 2012; 2(2): 91–103.
[39] KR Ghule, RR Deshmukh. Feature-Extraction-Techniques-for-Speech-Recognition-A-Review. Int. J.
Sci. Eng. Res. 2015; 6(5): 143–147.
[40] SC Joshi, AN Cheeran. MATLAB Based Feature Extraction Using Mel Frequency Cepstrum
Coefficients for Automatic Speech Recognition. Int. J. Sci. Eng. Technol. Res. 2014; 3(6):
1820–1823.
[41] SE Levinson. Mathematical Models for Speech Technology. Chichester, UK: John Wiley & Sons, Ltd.
2005.
[42] S Dhingra, G Nijhawan, P Pandit. Isolated speech recognition using MFCC and DTW. International
Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2013; 2(8):
4085–4092.
[43] S Bedi, R Singh. Hamming Generalized Low Time Liftered Cepstral Analysis. Int. J. Adv. Res.
Comput. Sci. Softw. Eng. 2017; 7(6): 157–160.
[44] NA Michael. Short-Time Spectrum and ‘Cepstrum’ Techniques for Vocal-Pitch Detection. J. Acoust.
Soc. Am. 1964; 36(2): 296–302.
[45] RM Martin, CL Burley. Power Cepstrum Technique With Application to Model Helicopter Acoustic
Data. NASA Technical Paper 2586. 1986.