The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...IJMREMJournal
The focus of the study is based on binary sentiment classification on aspect level to develop a hybrid sentiment
classification framework of WhatsApp MIMs (Mobile Instant Messages). It has been carried out into two phases
i.e. training phase and testing phase. The training phase, 75% data is used for training dataset. Pre-processing
techniques like tokenization, removing stop words, case normalization, removing punctuation and stemming are
applied to acquire cleaner dataset to be used as input. The output is sent to the classifier after applying TF-IDF
for feature weighting. In the second phase, the classifier is trial with 25% testing dataset. Bernoulli’s Naïve
Bayesian classifier which is an improved form of traditional Naïve Bayesian classifier is used to classify
sentiments. There are 417 messages in total where 244 and 173 are classified as positive and negative
respectively. The proposed model has achieved satisfactory results up to 81.73% in comparison to base-line
classification model by getting 12 points higher accuracy i.e. 69.23%.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
Emotional state recognition through speech is being a very interesting research topic nowadays.
Using subliminal information of speech, it is possible to recognize the emotional state of the
person. One of the main problems in the design of automatic emotion recognition systems is the
small number of available patterns. This fact makes the learning process more difficult, due to
the generalization problems that arise under these conditions.
In this work we propose a solution to this problem consisting in enlarging the training set
through the creation the new virtual patterns. In the case of emotional speech, most of the
emotional information is included in speed and pitch variations. So, a change in the average
pitch that does not modify neither the speed nor the pitch variations does not affect the
expressed emotion. Thus, we use this prior information in order to create new patterns applying
a pitch shift modification in the feature extraction process of the classification system. For this
purpose, we propose a frequency scaling modification of the Mel Frequency Cepstral
Coefficients, used to classify the emotion. This proposed process allows us to synthetically
increase the number of available patterns in thetraining set, thus increasing the generalization
capability of the system and reducing the test error.
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An effective evaluation study of objective measures using spectral subtractiv...eSAT Journals
Abstract
Unwanted noises have a negative influence over communication because it disturbs the conversation and make the communication impossible. Speech enhancement algorithms are used for improving the quality and intelligibility or to reduce listener fatigues. Assessment of speech quality can be done by using either subjective listening test or objective quality measure. Evaluation of several objective measures with the speech processed by enhancement algorithms has been performed but these having limitations to assess original speech signal. This paper represents the study of speech quality measures and compute the values used for regression analyses of the objective measures evaluation study using spectral subtraction algorithm based enhanced speech signal.
Keywords: MOS, ITU-T (P.835), SNRseg, log- likelihood ratio and itakura-saito.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
This paper contains a report on an Audio-Visual Client Recognition System using Matlab software which identifies five clients and can be improved to identify as many clients as possible depending on the number of clients it is trained to identify which was successfully implemented. The implementation was accomplished first by visual recognition system implemented using The Principal Component Analysis, Linear Discriminant Analysis and Nearest Neighbour Classifier. A successful implementation of second part was achieved by audio recognition using Mel-Frequency Cepstrum Coefficient, Linear Discriminant Analysis and Nearest Neighbour Classifier the system was tested using images and sounds that have not been trained to the system to see whether it can detect an intruder which lead us to a very successful result with précised response to intruder.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
This paper presents an approach to the recognition of speech signal using frequency spectral information with Mel frequency for the improvement of speech feature representation in a HMM based recognition approach. A frequency spectral information is incorporated to the conventional Mel spectrum base speech recognition approach. The Mel frequency approach exploits the frequency observation for speech signal in a given resolution which results in resolution feature overlapping resulting in recognition limit. Resolution decomposition with separating frequency is mapping approach for a HMM based speech recognition system. The Simulation results show an improvement in the quality metrics of speech recognition with respect to computational time, learning accuracy for a speech recognition system.
Development of Quranic Reciter Identification System using MFCC and GMM Clas...IJECEIAES
Nowadays, there are many beautiful recitation of Al-Quran available. Quranic recitation has its own characteristics, and the problem to identify the reciter is similar to the speaker recognition/identification problem. The objective of this paper is to develop Quran reciter identification system using Mel-frequency Cepstral Coefficient (MFCC) and Gaussian Mixture Model (GMM). In this paper, a database of five Quranic reciters is developed and used in training and testing phases. We carefully randomized the database from various surah in the Quran so that the proposed system will not prone to the recited verses but only to the reciter. Around 15 Quranic audio samples from 5 reciters were collected and randomized, in which 10 samples were used for training the GMM and 5 samples were used for testing. Results showed that our proposed system has 100% recognition rate for the five reciters tested. Even when tested with unknown samples, the proposed system is able to reject it.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A COOPERATIVE LOCALIZATION METHOD BASED ON V2I COMMUNICATION AND DISTANCE INF...IJCNCJournal
Relative positions are recent solutions to overcome the limited accuracy of GPS in urban environment.
Vehicle positions obtained using V2I communication are more accurate because the known roadside unit
(RSU) locations help predict errors in measurements over time. The accuracy of vehicle positions depends
more on the number of RSUs; however, the high installation cost limits the use of this approach. It also
depends on nonlinear localization nature. They were neglected in several research papers. In these studies,
the accumulated errors increased with time due to the linearity localization problem. In the present study,
a cooperative localization method based on V2I communication and distance information in vehicular
networks is proposed for improving the estimates of vehicles’ initial positions. This method assumes that
the virtual RSUs based on mobility measurements help reduce installation costs and facilitate in handling
fault environments. The extended Kalman filter algorithm is a well-known estimator in nonlinear problem,
but it requires well initial vehicle position vector and adaptive noise in measurements. Using the proposed
method, vehicles’ initial positions can be estimated accurately. The experimental results confirm that the
proposed method has superior accuracy than existing methods, giving a root mean square error of
approximately 1 m. In addition, it is shown that virtual RSUs can assist in estimating initial positions in
fault environments.
Influences of Buffer Size and Eb/No on Very Small Aperture Terminal (VSAT) Co...TELKOMNIKA JOURNAL
In data communication of the signal transmitted from the transmitter (Tx) to receiver (Rx) stations
is very influential. Buffer and Eb/No are two parameters that influence the quality of signal. This research
measures those parameters and the relationship among them. This research employs data collected on
the Link STM-1 side in Makassar and Timika operated by PT. Telkom Metra Bogor. The period of data is
carried out for 56 days taken by using Simple Management Network Protocol (SNMP). To analyze the
relationship among those two parameters, we use product moment correlation (PMC) method. The result
correlation of the data buffer and Eb/No with a level of real is 0.05 and then buffer set in modem CDM 700
is 50% with threshold Eb/No 12.1 dB and the modulations used 64-QAM. That resulted correlation of side in
Makassar is 0.648 and the p-value is 0.000. Correlation of side in Timika is 0.722 and the p-value is 0.000.
These results suggest that the two parameters are correlated strong and significant.
Optimization of Quality of Service Parameters for Dynamic Channel Allocation ...ijngnjournal
As the spectrum for wireless transmission gets crowded due to the increase in the users and applications, the efficient use of the spectrum is a major challenge in today’s world. A major affecting factor is the inefficient usage of the frequency bands. Interference in the neighboring cells affects the reuse of the frequency bands. In this paper, some of the quality of service parameters such as residual bandwidth, number of users, duration of calls, frequency of calls and priority are considered. This paper presents work based on the optimization of dynamic channel allocation using genetic algorithm (GA). This attempts to allocate the channel to users such that overall congestion in the network is minimized by reusing already allocated frequencies. The working of Genetic Algorithm which is used in the optimization procedure is also explained. The optimized channel is then compared with a non-optimized channel to check the efficiency of the genetic algorithm.
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
This research proposes a simulation of the logic series of speech recognition on the MFCC (Mel Frequency Spread Spectrum) based FPGA and Euclidean Distance to control the robotic car motion. The speech known would be used as a command to operate the robotic car. MFCC in this study was used in the feature extraction process, while Euclidean distance was applied in the feature classification process of each speech that later would be forwarded to the part of decision to give the control logic in robotic motor. The test that has been conducted showed that the logic series designed was precise here by measuring the Mel Frequency Warping and Power Cepstrum. With the achievement of logic design in this research proven with a comparison between the Matlab computation and Xilinx simulation, it enables to facilitate the researchers to continue its implementation to FPGA hardware.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
This paper describes the development of an efficient speech recognition system using different techniques such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ) and Hidden Markov Model (HMM).
This paper explains how speaker recognition followed by speech recognition is used to recognize the speech faster, efficiently and accurately. MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. Then HMM is used on Quantized feature vectors to identify the word by evaluating the maximum log likelihood values
for the spoken word.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...IJMREMJournal
The focus of the study is based on binary sentiment classification on aspect level to develop a hybrid sentiment
classification framework of WhatsApp MIMs (Mobile Instant Messages). It has been carried out into two phases
i.e. training phase and testing phase. The training phase, 75% data is used for training dataset. Pre-processing
techniques like tokenization, removing stop words, case normalization, removing punctuation and stemming are
applied to acquire cleaner dataset to be used as input. The output is sent to the classifier after applying TF-IDF
for feature weighting. In the second phase, the classifier is trial with 25% testing dataset. Bernoulli’s Naïve
Bayesian classifier which is an improved form of traditional Naïve Bayesian classifier is used to classify
sentiments. There are 417 messages in total where 244 and 173 are classified as positive and negative
respectively. The proposed model has achieved satisfactory results up to 81.73% in comparison to base-line
classification model by getting 12 points higher accuracy i.e. 69.23%.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
Emotional state recognition through speech is being a very interesting research topic nowadays.
Using subliminal information of speech, it is possible to recognize the emotional state of the
person. One of the main problems in the design of automatic emotion recognition systems is the
small number of available patterns. This fact makes the learning process more difficult, due to
the generalization problems that arise under these conditions.
In this work we propose a solution to this problem consisting in enlarging the training set
through the creation the new virtual patterns. In the case of emotional speech, most of the
emotional information is included in speed and pitch variations. So, a change in the average
pitch that does not modify neither the speed nor the pitch variations does not affect the
expressed emotion. Thus, we use this prior information in order to create new patterns applying
a pitch shift modification in the feature extraction process of the classification system. For this
purpose, we propose a frequency scaling modification of the Mel Frequency Cepstral
Coefficients, used to classify the emotion. This proposed process allows us to synthetically
increase the number of available patterns in thetraining set, thus increasing the generalization
capability of the system and reducing the test error.
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An effective evaluation study of objective measures using spectral subtractiv...eSAT Journals
Abstract
Unwanted noises have a negative influence over communication because it disturbs the conversation and make the communication impossible. Speech enhancement algorithms are used for improving the quality and intelligibility or to reduce listener fatigues. Assessment of speech quality can be done by using either subjective listening test or objective quality measure. Evaluation of several objective measures with the speech processed by enhancement algorithms has been performed but these having limitations to assess original speech signal. This paper represents the study of speech quality measures and compute the values used for regression analyses of the objective measures evaluation study using spectral subtraction algorithm based enhanced speech signal.
Keywords: MOS, ITU-T (P.835), SNRseg, log- likelihood ratio and itakura-saito.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
This paper contains a report on an Audio-Visual Client Recognition System using Matlab software which identifies five clients and can be improved to identify as many clients as possible depending on the number of clients it is trained to identify which was successfully implemented. The implementation was accomplished first by visual recognition system implemented using The Principal Component Analysis, Linear Discriminant Analysis and Nearest Neighbour Classifier. A successful implementation of second part was achieved by audio recognition using Mel-Frequency Cepstrum Coefficient, Linear Discriminant Analysis and Nearest Neighbour Classifier the system was tested using images and sounds that have not been trained to the system to see whether it can detect an intruder which lead us to a very successful result with précised response to intruder.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
This paper presents an approach to the recognition of speech signal using frequency spectral information with Mel frequency for the improvement of speech feature representation in a HMM based recognition approach. A frequency spectral information is incorporated to the conventional Mel spectrum base speech recognition approach. The Mel frequency approach exploits the frequency observation for speech signal in a given resolution which results in resolution feature overlapping resulting in recognition limit. Resolution decomposition with separating frequency is mapping approach for a HMM based speech recognition system. The Simulation results show an improvement in the quality metrics of speech recognition with respect to computational time, learning accuracy for a speech recognition system.
Development of Quranic Reciter Identification System using MFCC and GMM Clas...IJECEIAES
Nowadays, there are many beautiful recitation of Al-Quran available. Quranic recitation has its own characteristics, and the problem to identify the reciter is similar to the speaker recognition/identification problem. The objective of this paper is to develop Quran reciter identification system using Mel-frequency Cepstral Coefficient (MFCC) and Gaussian Mixture Model (GMM). In this paper, a database of five Quranic reciters is developed and used in training and testing phases. We carefully randomized the database from various surah in the Quran so that the proposed system will not prone to the recited verses but only to the reciter. Around 15 Quranic audio samples from 5 reciters were collected and randomized, in which 10 samples were used for training the GMM and 5 samples were used for testing. Results showed that our proposed system has 100% recognition rate for the five reciters tested. Even when tested with unknown samples, the proposed system is able to reject it.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A COOPERATIVE LOCALIZATION METHOD BASED ON V2I COMMUNICATION AND DISTANCE INF...IJCNCJournal
Relative positions are recent solutions to overcome the limited accuracy of GPS in urban environment.
Vehicle positions obtained using V2I communication are more accurate because the known roadside unit
(RSU) locations help predict errors in measurements over time. The accuracy of vehicle positions depends
more on the number of RSUs; however, the high installation cost limits the use of this approach. It also
depends on nonlinear localization nature. They were neglected in several research papers. In these studies,
the accumulated errors increased with time due to the linearity localization problem. In the present study,
a cooperative localization method based on V2I communication and distance information in vehicular
networks is proposed for improving the estimates of vehicles’ initial positions. This method assumes that
the virtual RSUs based on mobility measurements help reduce installation costs and facilitate in handling
fault environments. The extended Kalman filter algorithm is a well-known estimator in nonlinear problem,
but it requires well initial vehicle position vector and adaptive noise in measurements. Using the proposed
method, vehicles’ initial positions can be estimated accurately. The experimental results confirm that the
proposed method has superior accuracy than existing methods, giving a root mean square error of
approximately 1 m. In addition, it is shown that virtual RSUs can assist in estimating initial positions in
fault environments.
Influences of Buffer Size and Eb/No on Very Small Aperture Terminal (VSAT) Co...TELKOMNIKA JOURNAL
In data communication of the signal transmitted from the transmitter (Tx) to receiver (Rx) stations
is very influential. Buffer and Eb/No are two parameters that influence the quality of signal. This research
measures those parameters and the relationship among them. This research employs data collected on
the Link STM-1 side in Makassar and Timika operated by PT. Telkom Metra Bogor. The period of data is
carried out for 56 days taken by using Simple Management Network Protocol (SNMP). To analyze the
relationship among those two parameters, we use product moment correlation (PMC) method. The result
correlation of the data buffer and Eb/No with a level of real is 0.05 and then buffer set in modem CDM 700
is 50% with threshold Eb/No 12.1 dB and the modulations used 64-QAM. That resulted correlation of side in
Makassar is 0.648 and the p-value is 0.000. Correlation of side in Timika is 0.722 and the p-value is 0.000.
These results suggest that the two parameters are correlated strong and significant.
Optimization of Quality of Service Parameters for Dynamic Channel Allocation ...ijngnjournal
As the spectrum for wireless transmission gets crowded due to the increase in the users and applications, the efficient use of the spectrum is a major challenge in today’s world. A major affecting factor is the inefficient usage of the frequency bands. Interference in the neighboring cells affects the reuse of the frequency bands. In this paper, some of the quality of service parameters such as residual bandwidth, number of users, duration of calls, frequency of calls and priority are considered. This paper presents work based on the optimization of dynamic channel allocation using genetic algorithm (GA). This attempts to allocate the channel to users such that overall congestion in the network is minimized by reusing already allocated frequencies. The working of Genetic Algorithm which is used in the optimization procedure is also explained. The optimized channel is then compared with a non-optimized channel to check the efficiency of the genetic algorithm.
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
This research proposes a simulation of the logic series of speech recognition on the MFCC (Mel Frequency Spread Spectrum) based FPGA and Euclidean Distance to control the robotic car motion. The speech known would be used as a command to operate the robotic car. MFCC in this study was used in the feature extraction process, while Euclidean distance was applied in the feature classification process of each speech that later would be forwarded to the part of decision to give the control logic in robotic motor. The test that has been conducted showed that the logic series designed was precise here by measuring the Mel Frequency Warping and Power Cepstrum. With the achievement of logic design in this research proven with a comparison between the Matlab computation and Xilinx simulation, it enables to facilitate the researchers to continue its implementation to FPGA hardware.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
This paper describes the development of an efficient speech recognition system using different techniques such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ) and Hidden Markov Model (HMM).
This paper explains how speaker recognition followed by speech recognition is used to recognize the speech faster, efficiently and accurately. MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. Then HMM is used on Quantized feature vectors to identify the word by evaluating the maximum log likelihood values
for the spoken word.
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
Voice is the best biometric feature for investigation and authentication. It has both biological and behavioural features. The acoustic features are related to the voice. The Speaker Recognition System is designed for the automatic authentication of speaker’s identity which is truly based on the human’s voice. Mel Frequency Cepstrum coefficient (MFCC) and Linear Prediction Cepstrum coefficient (LPCC) are taken in use for feature extraction from the provided voice sample. This paper provides a comparative study of MFCC and LPCC based on the accuracy of results and their working methodology. The results are better if MFCC is used for feature extraction.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
Abstract : This paper discusses the methodology for a project named “Speech Recognized Automation System using Speaker Identification through wireless communication”. This project gives the design of Automation system using wireless communication and speaker recognition using Matlab code. Straightforward programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless automation system which is built and implemented. The speech recognition centers on recognition of speech commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech processor is used to control the appliances without speaker identification. Keywords — Automation system, MATLAB code, MFCC, speaker identification, ZigBee transceiver.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
Automatic voice recognition system aims to limit fraudulent access to sensitive areas as labs. Our primary objective of this paper is to increasethe accuracy of the voice recognition in noisy environment of the Microsoft Research (MSR) identity toolbox. The proposed system enabled the user tospeak into the microphone then it will match unknown voice with other human voices existing in the database using a statistical model, in order togrant or deny access to the system. The voice recognition was done in twosteps: training and testing. During the training a Universal BackgroundModel as well as a Gaussian Mixtures Model: GMM-UBM models arecalculated based on different sentences pronounced by the human voice (s) used to record the training data. Then the testing of voice signal in noisyenvironment calculated the Log-Likelihood Ratio of the GMM-UBM models in order to classify user's voice. However, before testing noise and de-noisemethods were applied, we investigated different MFCC features of the voiceto determine the best feature possible as well as noise filter algorithmthat subsequently improved the performance of the automatic voicerecognition system.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcscpconf
Emotional state recognition through speech is being a very interesting research topic nowadays.
Using subliminal information of speech, it is possible to recognize the emotional state of the
person. One of the main problems in the design of automatic emotion recognition systems is the
small number of available patterns. This fact makes the learning process more difficult, due to
the generalization problems that arise under these conditions.
In this work we propose a solution to this problem consisting in enlarging the training set
through the creation the new virtual patterns. In the case of emotional speech, most of the
emotional information is included in speed and pitch variations. So, a change in the average
pitch that does not modify neither the speed nor the pitch variations does not affect the
expressed emotion. Thus, we use this prior information in order to create new patterns applying
a pitch shift modification in the feature extraction process of the classification system. For this
purpose, we propose a frequency scaling modification of the Mel Frequency Cepstral
Coefficients, used to classify the emotion. This proposed process allows us to synthetically
increase the number of available patterns in thetraining set, thus increasing the generalization
capability of the system and reducing the test error.
Comparative study to realize an automatic speaker recognition system IJECEIAES
In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
Efficient feature descriptor selection for improved Arabic handwritten words ...IJECEIAES
Arabic handwritten text recognition has long been a difficult subject, owing to the similarity of its characters and the wide range of writing styles. However, due to the intricacy of Arabic handwriting morphology, solving the challenge of cursive handwriting recognition remains difficult. In this paper, we propose a new efficient based image processing approach that combines three image descriptors for the feature extraction phase. To prepare the training and testing datasets, we applied a series of preprocessing techniques to 100 classes selected from the handwritten Arabic database of the Institut Für Nachrichtentechnik/Ecole Nationale d'Ingénieurs de Tunis (IFN/ENIT). Then, we trained the k-nearest neighbor’s algorithm (k-NN) algorithm to generate the best model for each feature extraction descriptor. The best k-NN model, according to common performance evaluation metrics, is used to classify Arabic handwritten images according to their classes. Based on the performance evaluation results of the three k-NN generated models, the majority-voting algorithm is used to combine the prediction results. A high recognition rate of up to 99.88% is achieved, far exceeding the state-of-the-art results using the IFN/ENIT dataset. The obtained results highlight the reliability of the proposed system for the recognition of handwritten Arabic words.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
Limited Data Speaker Verification: Fusion of FeaturesIJECEIAES
The present work demonstrates experimental evaluation of speaker verification for dif- ferent speech feature extraction techniques with the constraints of limited data (less than 15 seconds). The state-of-the-art speaker verification techniques provide good performance for sufficient data (greater than 1 minutes). It is a challenging task to develop techniques which perform well for speaker verification under limited data condition. In this work different features like Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Delta (4), Delta-Delta (44), Linear Prediction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are considered. The performance of individual features is studied and for better verification performance, combination of these features is attempted. A comparative study is made between Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) through experimental evaluation. The experiments are conducted using NIST-2003 database. The experimental results show that, the combination of features provides better performance compared to the individual features. Further GMM-UBM modeling gives reduced equal error rate (EER) as compared to GMM.
Channel Estimation in MIMO OFDM Systems with Tapped Delay Line ModelIJCNCJournal
The continuous increase in the user demands fornew-generation communication systems, is making the wireless channel more complex and challenging for estimation, developing a simulation model for the channel,and evaluating the performance of different MIMO systems. In this work, a simulation model for multipath fading channels in wireless communication is performed. The model includes a selection of typical Tapped-Delay-Line channel models that can be implemented to reproduce the effects of representative channel distortion and interference. Based on the simulation results, the proposed method exhibits accurate channel estimation performance for frequency-selective fading channels. The proposed work employed LS, MMSE, and ML methods for channel estimation, using 16 and 32 pilots and fixed pilot locations in each frame. Results are obtained for 4x4, 8x8, 16x16, 16x8, and 16x4 MIMO systems and tapped delay line systems.
Similar to Comparative Study of Different Techniques in Speaker Recognition: Review (20)
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Comparative Study of Different Techniques in Speaker Recognition: Review
1. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-3, Issue-3, Mar- 2017]
https://dx.doi.org/10.24001/ijaems.3.3.25 ISSN: 2454-1311
www.ijaems.com Page | 284
Comparative Study of Different Techniques in
Speaker Recognition: Review
Sonali T. Saste1
, Prof. S. M. Jagdale2
1
Department of Electronics and Tele-Communication Engineering, Bharati Vidyapeeth College of Engg. For Women, Pune,
Maharashtra, India
2
Department of Electronics and Tele-Communication Engineering, Bharati Vidyapeeth College of Engg. For Women, Pune,
Maharashtra, India
Abstract— The speech is most basic and essential method
of communication used by person.On the basis of
individual information included in speech signals the
speaker is recognized. Speaker recognition (SR) is useful
to identify the person who is speaking. In recent years
speaker recognition is used for security system. In this
paper we have discussed the feature extraction techniques
like Mel frequency cepstral coefficient (MFCC), Linear
predictive coding (LPC), Dynamic time wrapping (DTW),
and for classification Gaussian Mixture Models (GMM),
Artificial neural network (ANN)& Support vector
machine (SVM).
Keywords— ANN,DTW,GMM, LPC, MFCC.
I. INTRODUCTION
Speech is one of the most important ways to
communication. It gives many levels of information to the
listeners. It conveys the message; also give information
about gender, emotion and identity of speaker. There are
many situations where the correct recognition of speaker
is required. In biometric there are many ways to identify
the person like finger print, palm, iris, face, voice
recognition. The objective of the speaker
acknowledgment is to describe and separate the data from
discourse or speaker voice. Text independent and text
dependent speaker recognition are the types of speaker
recognition. [2]The behavioral aspect of human voice is
utilized for distinguishing proof by changing over a talked
expression from simple to computerized design, and
extracting unique vocal characteristics, such as pitch,
frequency, tone and cadenceto set up a speaker model or
voice test. In voice acknowledgment, enlistment and
confirmation procedures are included.Enrollment process
describes the registration of speaker by training his voice
features.[3]
II. RELATED WORK
There are many methodologies have been proposed for
speaker recognition. This is a system that can recognize a
person based on his/her voice. This is accomplished by
actualizing complex flag handling calculations that keep
running on an advanced PC or a processor.The speaker
recognition system can be classified as speaker
identification or speaker verification. [7]
1. Mel frequency cepstral coefficient (MFCC): This is
the most powerful feature extraction technique, used in
speaker recognition which works on human auditory
system. Seiichi Nakagawa used the MFCC for speaker
identification and verification.[1] Abdelmajid H. Mansour
also used MFCC for voice recognition.[3]
2. Linear predictive coding (LPC): LPC is the
simplified model for the speech production. It is a
technique for determining the basic parameters of speech
and provides precise estimation of speech parameters and
computational model of speech.Speech test can be
approximated as a straight mix of past speech tests is the
fundamental thought behind LPC. [11] KinnalDhameliya
used the LPC method for feature extraction in speaker
recognition.[2]
3. Dynamic time wrapping (DTW): DTW mainly
focuses on matching of two sequences of feature vectors.
It is used to calculate the distance between two time series
that vary in time.[3] Abdelmajid H. Mansour uses DTW
for feature extraction in voice recognition system.
III. FEATURE EXTRACTION
Feature extraction is the way toward distinguishing
distinctive elements from the information flag. After the
pre-processing this feature extraction is finished. For
speech there are many elements like MFCC, pitch,
energy, formant and so forth are extracted.
Fig.1: Basic speech recognition system
1. MFCC: MFCC is classical approach to analyze speech
signal, which represents the short-term power spectrum of
sound, based on linear cosine transform of a log power
spectrum on a nonlinear Mel scale of frequency. It is most
2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-3, Issue-3, Mar- 2017]
https://dx.doi.org/10.24001/ijaems.3.3.25 ISSN: 2454-1311
www.ijaems.com Page | 285
popular system because it approximates the human
auditory system response more closely than other
system.[1][8]
Fig.2: Steps involved in MFCC algorithm.
MFCC can be computed by using the formula,F(Mel) =
[2595* log 10 [1+F] /700]………….(1)The recognition
accuracy is high. That means the performance rate of
MFCC is high. MFCC captures main characteristics of
phones in speech. Low Complexity when using
MFCC.[18]
2. LPC:To decide the fundamental parameters of speech
and gives exact estimation of speech parameters and
computational model of discourse this LPC system is
utilized. Speech test can be approximated as a direct
blend of past speech tests is the fundamental thought
behind LPC.[11]The following fig. shows the steps
involved in the LPC feature extraction.
Fig.3: Steps involve in LPC feature extraction
Is a reliable, accurate and robust technique for providing
parameters which describe the time varying linear system
which represents the vocal tract.Computation speed of
LPC is good and provides with accurate parameters of
speech. LPC is useful for encoding speech at low bit
rate.[18]
3. DTW:This algorithm depends on element
programming which utilized for measuring similarity
between two time arrangement which may differ in time
or speed.This warping between two time series can then
be used to find corresponding regions between the two
time series or to determine the similarity between the two
time series.[6]The groupings are "warped" non-directly in
the time measurement to decide a measure of their
likeness free of certain non-straight varieties in the time
measurement. This succession arrangement technique is
frequently utilized as a part of time arrangement
grouping. In spite of the fact that DTW measures a
separation like amount between two given groupings, it
doesn't ensure the triangle disparity to hold.
Notwithstanding a closeness measure between the two
groupings, a purported "warping path" is delivered,by
warping according to this path the two signals may be
aligned in time.
DTW that is ordered into highlight based example
acknowledgment procedures does not have to set up (or
prepare) an arrangement display in progress; thusly, it is
for the most part seen as an adroitly straightforward and
coordinate acknowledgment strategy. Ordinary DTW is
by and large utilized for performing discourse
acknowledgment, and few reviews are believed to utilize
the DTW method for execute speaker acknowledgment.
[19]
4. Other feature extraction techniques: Various other
feature extraction strategies are accessible by basically
altering the above element extraction procedures,
including the following:
Mean and Variance of the residual phase
Delta and double delta of MFCC features
Along these many feature extraction strategies are
accessible. One can utilize any method as suitable by
application and can show signs of improvement
acknowledgment exactness. Presently these elements we
need to store in a database for N distinctive speakers and
this database will be utilized for classification.[2]
IV. CLASSIFICATION
1. GMM: A Gaussian mixture model is a probabilistic
model that accepts every one of the information focuses
are produced from a blend of a limited number of
Gaussian circulations with obscure parameters. One can
consider blend models as summing up k-cluster grouping
to join data about the covariance structure of the
information and additionally the focuses of the inactive
Gaussians.[10]
The Gaussian Mixture Model executes the desire
expansion (EM) calculation for fitting blend of-Gaussian
models. It can likewise draw certainty ellipsoids for
multivariate models, and register the Bayesian
Information Criterion to evaluate the quantity of bunches
in the information. A Gaussian Mixture. fit strategy is
given that takes in a Gaussian Mixture Model from
prepare information. Given test information, it can dole
out to each example the Gaussian it generally most likely
has a place with utilizing the Gaussian Mixture. predict
technique.[1][10]
GMM require less preparing and test data. It performs
better as it requires less measures of information to
prepare the classifier consequently memory prerequisite is
less.[19]
4. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-3, Issue-3, Mar- 2017]
https://dx.doi.org/10.24001/ijaems.3.3.25 ISSN: 2454-1311
www.ijaems.com Page | 287
[10]Nidhi Desai, KinnalDhameliyaand Vijayendra
Desai, " Feature Extraction and Classification
Techniques for Speech Recognition: A Review,"
International Journal of Emerging Technology and
Advanced Engineering, Volume 3, Issue 12,
December 2013, pp. 367-371.
[11]Seiichi Nakagawa, Longbiao Wang and Shinji
Ohtsuka, “Speaker Identification and Verification by
Combining MFCC and Phase Information,” IEEE
transaction on audio, speech and language
processing, Vol. 20, No. 4, May 2012, pp. 1085-
1095. 13.
[12]Jianglin Wang, AnJi and Michael T. Johnson,
“Features for PhonemeIndependent Speaker
Identification,” IEEE International Conference on
Audio Language and Image Processing (ICALIP),
Shanghai, July 2012, pp. 1141-1145.
[13]Srikanth R Madikeri and Hema A Murthy, “Mel
Filter Bank Energy- Based Slope Feature and Its
Application to Speaker Recognition,” IEEE National
Conference on communication (NCC), Bangalore,
January 2011, pp. 1-4.
[14]Hemant A. Patil, Purushotam G. Radadia and T. K.
Basu, “Combining Evidences from Mel Cepstral
Features and Cepstral Mean Subtracted Features for
Singer Identification,” IEEE International
Conference on Asian Language Processing, Hanoi,
November 2012, pp. 145-148.
[15]S. Rajasekaran and G.A. VijayalakshmiPai, “Neural
Networks, Fuzzy Logic, and Genetic Algorithms
Synthesis and Applications”, PHI, 2003.
[16]Shahzadi Farah and AzraShamim, “Speaker
Recognition System Using Mel-Frequency
Cepstrum Coefficients, Linear Prediction Coding
and Vector Quantization,” 3rd IEEE International
Conference on Computer, Control &Communication
(IC4), Karachi, September 2013, pp. 1-5.
[17]Nidhi Desai, KinnalDhameliya and Vijayendra
Desai, “Recognizing voice commands for robot
using MFCC and DTW,” International Journal of
Advanced Research in Computer and
Communication Engineering, Vol. 3, Issue 5, May
2014.
[18]ShreyaNarang, Ms. Divya Gupta “Speech Feature
Extraction Techniques: A Review”IJCSMC, Vol. 4,
Issue. 3, March 2015, pg.107 – 114
[19]Ing-Jr Ding, Chih-Ta Yen and Da-Cheng Ou “A
Method to Integrate GMM, SVM and DTW for
Speaker Recognition”International Journal of
Engineering and Technology Innovation, vol. 4, no.
1, 2014, pp. 38-47
[20]TomiKinnunen, Haizhou Li “An overview of text-
independent speaker recognition:From features to
supervectors”T. Kinnunen, H. Li / Speech
Communication 52 (2010) 12–40