Automatic voice recognition system aims to limit fraudulent access to sensitive areas as labs. Our primary objective of this paper is to increasethe accuracy of the voice recognition in noisy environment of the Microsoft Research (MSR) identity toolbox. The proposed system enabled the user tospeak into the microphone then it will match unknown voice with other human voices existing in the database using a statistical model, in order togrant or deny access to the system. The voice recognition was done in twosteps: training and testing. During the training a Universal BackgroundModel as well as a Gaussian Mixtures Model: GMM-UBM models arecalculated based on different sentences pronounced by the human voice (s) used to record the training data. Then the testing of voice signal in noisyenvironment calculated the Log-Likelihood Ratio of the GMM-UBM models in order to classify user's voice. However, before testing noise and de-noisemethods were applied, we investigated different MFCC features of the voiceto determine the best feature possible as well as noise filter algorithmthat subsequently improved the performance of the automatic voicerecognition system.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Mixed Language Based Offline Handwritten Character Recognition Using First St...CSCJournals
Artificial Neural Network is an artificial representation of the human brain that tries to simulate its learning process. To train a network and measure how well it performs, an objective function must be defined. A commonly used performance criterion function is the sum of squares error function. Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. Language identification and interpretation of handwritten characters is one of the challenges faced in various industries. For example, it is always a big challenge in data interpretation from cheques in banks, language identification and translated messages from ancient script in the form of manuscripts, palm scripts and stone carvings to name a few. Handwritten character recognition using Soft computing methods like Neural networks is always a big area of research for long time and there are multiple theories and algorithms developed in the area of neural networks for handwritten character recognition.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
This paper proposes a multimodal biometric system
using palmprint and speech signal. In this paper, we propose a
novel approaches for both the modalities. We extract the
features using Subband Cepstral Coefficients for speech signal
and Modified Canonical method for palmprint. The individual
feature score are passed to the fusion level. Also we have
proposed a new fusion method called weighted score. This
system is tested on clean and degraded database collected by
the authors for more than 300 subjects. The results show
significant improvement in the recognition rate.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Audio Features Based Steganography Detection in WAV Fileijtsrd
Audio signals containing secret information or not is a security issue addressed in the context of steganalysis. ThRainfalle conceptual ide lies in the difference of the distribution of various statistical distance measures between the cover audio signals and stego audio signals. The aim of the propose system is to analyze the audio signal which have the presence of information hiding behavior or not. Mel frequency ceptral coefficient, zero crossing rate, spectral flux and short time energy features of audio signal are extracted, and combine these features with the features extracted from the modified version that is generated by randomly modifying with significant bits. Moreover, the extracted features are detected or classified with a support vector machine in this propose system. Experimental result show that the propose method performs well in steganalysis of the audio stegnograms that are produced by using S tools4. Khin Myo Kyi "Audio Features Based Steganography Detection in WAV File" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26807.pdf Paper URL: https://www.ijtsrd.com/computer-science/other/26807/audio-features-based-steganography-detection-in-wav-file/khin-myo-kyi
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Mixed Language Based Offline Handwritten Character Recognition Using First St...CSCJournals
Artificial Neural Network is an artificial representation of the human brain that tries to simulate its learning process. To train a network and measure how well it performs, an objective function must be defined. A commonly used performance criterion function is the sum of squares error function. Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. Language identification and interpretation of handwritten characters is one of the challenges faced in various industries. For example, it is always a big challenge in data interpretation from cheques in banks, language identification and translated messages from ancient script in the form of manuscripts, palm scripts and stone carvings to name a few. Handwritten character recognition using Soft computing methods like Neural networks is always a big area of research for long time and there are multiple theories and algorithms developed in the area of neural networks for handwritten character recognition.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
This paper proposes a multimodal biometric system
using palmprint and speech signal. In this paper, we propose a
novel approaches for both the modalities. We extract the
features using Subband Cepstral Coefficients for speech signal
and Modified Canonical method for palmprint. The individual
feature score are passed to the fusion level. Also we have
proposed a new fusion method called weighted score. This
system is tested on clean and degraded database collected by
the authors for more than 300 subjects. The results show
significant improvement in the recognition rate.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Audio Features Based Steganography Detection in WAV Fileijtsrd
Audio signals containing secret information or not is a security issue addressed in the context of steganalysis. ThRainfalle conceptual ide lies in the difference of the distribution of various statistical distance measures between the cover audio signals and stego audio signals. The aim of the propose system is to analyze the audio signal which have the presence of information hiding behavior or not. Mel frequency ceptral coefficient, zero crossing rate, spectral flux and short time energy features of audio signal are extracted, and combine these features with the features extracted from the modified version that is generated by randomly modifying with significant bits. Moreover, the extracted features are detected or classified with a support vector machine in this propose system. Experimental result show that the propose method performs well in steganalysis of the audio stegnograms that are produced by using S tools4. Khin Myo Kyi "Audio Features Based Steganography Detection in WAV File" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26807.pdf Paper URL: https://www.ijtsrd.com/computer-science/other/26807/audio-features-based-steganography-detection-in-wav-file/khin-myo-kyi
Software projects mostly exceeds budget, delivered late and does not meet with the customer’s satisfaction for years. In the past, many traditional development models like waterfall, spiral, iterative, and prototyping methods are used to build the software systems. In recent years, agile models are widely used in developing the software products. The major reasons are – simplicity, incorporating the requirement changes at any time, light-weight approach and delivering the working product early and in short duration. Whatever the development model used, it still remains a challenge for software engineer’s to accurately estimate the size, effort and the time required for developing the software system. This survey focuses on the existing estimation models used in traditional as well in agile software development.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Dynamic thresholding on speech segmentationeSAT Journals
Abstract Word is the preferred and natural unit of speech, because word units have well defined acoustic representation. This paper presents several dynamic thresholding approaches for segmenting continuous Bangla speech sentences into words/sub-words. We have proposed three efficient methods for speech segmentation: two of them are usually used in pattern classification (i.e., k-means and FCM clustering) and one of them is used in image segmentation (i.e., Otsu’s thresholding method). We also used new approaches blocking black area and boundary detection techniques to properly detect word boundaries in continuous speech and label the entire speech sentence into a sequence of words/sub-words. K-Means and FCM clustering methods produce better segmentation results than that of Otsu’s Method. All the algorithms and methods used in this research are implemented in MATLAB and the proposed system achieved the average segmentation accuracy of 94% approximately. Keywords: Blocking Black Area, Clustering, Dynamic Thresholding, Fuzzy Logic and Speech Segmentation.
Automatic recognition of speech using computers is a challenging issue. This paper describes a technique that uses Auto associative Neural Network (AANN) to recognized speech based on features using Sonogram. Modeling techniques such as AANN were used to model each individual word which is trained to the system. Each isolated word Segment using Voice Activity Detection (VAD) from the test sentence is matched against these models for finding the semantic representation of the test input speech. Experimental results of AANN shows good performance in recognized rate.
FUZZY INTERFACE SCHEME FOR COLOR SENSOR SYSTEM ijfls
The shade of color in the visible range not only depends on the wavelength but it also a function of intensity. This is vital in many applications, where color matching is highly desirable. In view of this we have made and attempt to assign some degrees to basic colors and shades around them. Since color identification by its shade a matter of degrees and involves the uncertainty. To circumvent this problem we thought of an idea of exploring the Fuzzy Logic Inference, where uncertainty in color identification can be handled via Fuzzy Sets and Fuzzy Reasoning.
The accuracy of the speaker identification system reduces severely in the presence of noise. The auditory based features called gammatone frequency ceptral coefficient (GFCC) which resembles to the ability of human ear to identify the speaker identity's in noisy environment. The GFCC is based on gammatone filter bank which models the basilar membrane as a sequence of overlapping band pass filters. The system uses GFCC features and Gaussian Mixture Model - Universal Background Model (GMM - UBM ) for modeling of features of the speaker. The experiment are conducted on the English Language Speech Database for Speaker Recognition (ELSDR) databases. In order to find out the performance of the system, the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results shows that the GFCC features has a good identification accuracy not only in clean sample, but also for noisy condition.
Icete content-based filtering with applications on tv viewing dataElaine Cecília Gatto
Recommendation systems provide recommendation based on information about users’ preferences. Information Filtering is used by recommendation systems so as information can be processed and suggested to users; and Content-Based Filtering is an Information Filtering approach very used in recommendation systems. Content-Based Filtering analyses the correlation of items content with the user’s profile, suggesting relevant items and putting away irrelevant items. Recommendation systems, which are very much used on the Internet, have been studied in order to be used on Digital TV context, and there already are several works in this sense. As they are used on the Internet, recommendation systems can be used in Digital TV in order to recommend TV programs, publicity and advertisement and also the electronic commerce. Thus, within Digital TV context, the items can be programs, advertisements and the products to be sold; and using Content-Based Filtering in the recommendation programs, for instance, these programs’ contents can be correlated with the user’s preferences, which in this scenario, are the type of program one wants to watch. This paper presents the studies accomplished with Content-Based Filtering with application on Digital TV data. The survey aims at observing and evaluating how some filtering techniques based on content can be used in recommendation systems in Digital TV context
T Silva, D D Karunaratna, G N Wikramanayake, K P Hewagamage, G K A Dias (2004) "Speaker Search and Indexing for Multimedia Databases" In:6th International Information Technology Conference, Edited by:V.K. Samaranayake et al. pp. 157-162. Infotel Lanka Society, Colombo, Sri Lanka: IITC Nov 29-Dec 1, ISBN: 955-8974-01-3
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
발표자: 김준태 (KAIST 박사과정)
발표일: 2018.10
Voice activity detection (VAD) and speech enhancement (SE) are important front-end technologies for noise robust speech recognition system.
From incoming noisy signal, VAD detects the speech signal only and SE removes the noise signal while conserving the speech signal.
For VAD and SE, this presentation will cover the traditional methods, deep learning based methods, and our papers as follows:
1. J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
2. J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Also, this presentation will briefly introduce some experimental results in real-world environment (far-field, noisy environment), conducted on the embedded board.
For VAD,
Traditional VAD methods.
Deep learning based VAD methods.
Paper presentation: J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
End point detection based on VAD.
Experimental results of DNN-EPD on embedded board in real-world environment.
For SE,
Traditional SE methods.
Deep learning based SE methods.
Paper presentation: J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Experimental results in real-world environment.
The paper presents a k-means based semi-supervised clustering approach for
recognizing and classifying P300 signals for BCI Speller System. P300 signals are proved to
be the most suitable Event Related Potential (ERP) signal, used to develop the BCI systems.
Due to non-stationary nature of ERP signals, the wavelet transform is the best analysis tool
for extracting informative features from P300 signals. The focus of the research is on semi-
supervised clustering as supervised clustering approach need large amount of labeled data
for training, which is a tedious task. Hence works for small-labeled datasets to train
classifiers. On the other hand, unsupervised clustering works when no prior information is
available i.e. totally unlabeled data. Thus leads to low level of performance. The in-between
solution is to use semi-supervised clustering, which uses a few labeled with large unlabeled
data causes less trouble and time. The authors have selected and defined adhoc features and
assumed the Clusters for small datasets. This motivates us to propose a novel approach that
discovers the features embedded in P300 (EEG) signals, using an k-means based semi-
supervised cluster classification using ensemble SVM
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text-dependent speaker identification system. It contains three main modules: endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short-Term Energy analysis is used for endpoint detection. Mel-frequency Cepstrum Coefficients (MFCC) is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization (VQ) approach using Linde, Buzo and Gray (LBG) clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung"A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18274.pdf http://www.ijtsrd.com/other-scientific-research-area/other/18274/a-robust-speaker-identification-system/zaw-win-aung
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Software projects mostly exceeds budget, delivered late and does not meet with the customer’s satisfaction for years. In the past, many traditional development models like waterfall, spiral, iterative, and prototyping methods are used to build the software systems. In recent years, agile models are widely used in developing the software products. The major reasons are – simplicity, incorporating the requirement changes at any time, light-weight approach and delivering the working product early and in short duration. Whatever the development model used, it still remains a challenge for software engineer’s to accurately estimate the size, effort and the time required for developing the software system. This survey focuses on the existing estimation models used in traditional as well in agile software development.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Dynamic thresholding on speech segmentationeSAT Journals
Abstract Word is the preferred and natural unit of speech, because word units have well defined acoustic representation. This paper presents several dynamic thresholding approaches for segmenting continuous Bangla speech sentences into words/sub-words. We have proposed three efficient methods for speech segmentation: two of them are usually used in pattern classification (i.e., k-means and FCM clustering) and one of them is used in image segmentation (i.e., Otsu’s thresholding method). We also used new approaches blocking black area and boundary detection techniques to properly detect word boundaries in continuous speech and label the entire speech sentence into a sequence of words/sub-words. K-Means and FCM clustering methods produce better segmentation results than that of Otsu’s Method. All the algorithms and methods used in this research are implemented in MATLAB and the proposed system achieved the average segmentation accuracy of 94% approximately. Keywords: Blocking Black Area, Clustering, Dynamic Thresholding, Fuzzy Logic and Speech Segmentation.
Automatic recognition of speech using computers is a challenging issue. This paper describes a technique that uses Auto associative Neural Network (AANN) to recognized speech based on features using Sonogram. Modeling techniques such as AANN were used to model each individual word which is trained to the system. Each isolated word Segment using Voice Activity Detection (VAD) from the test sentence is matched against these models for finding the semantic representation of the test input speech. Experimental results of AANN shows good performance in recognized rate.
FUZZY INTERFACE SCHEME FOR COLOR SENSOR SYSTEM ijfls
The shade of color in the visible range not only depends on the wavelength but it also a function of intensity. This is vital in many applications, where color matching is highly desirable. In view of this we have made and attempt to assign some degrees to basic colors and shades around them. Since color identification by its shade a matter of degrees and involves the uncertainty. To circumvent this problem we thought of an idea of exploring the Fuzzy Logic Inference, where uncertainty in color identification can be handled via Fuzzy Sets and Fuzzy Reasoning.
The accuracy of the speaker identification system reduces severely in the presence of noise. The auditory based features called gammatone frequency ceptral coefficient (GFCC) which resembles to the ability of human ear to identify the speaker identity's in noisy environment. The GFCC is based on gammatone filter bank which models the basilar membrane as a sequence of overlapping band pass filters. The system uses GFCC features and Gaussian Mixture Model - Universal Background Model (GMM - UBM ) for modeling of features of the speaker. The experiment are conducted on the English Language Speech Database for Speaker Recognition (ELSDR) databases. In order to find out the performance of the system, the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results shows that the GFCC features has a good identification accuracy not only in clean sample, but also for noisy condition.
Icete content-based filtering with applications on tv viewing dataElaine Cecília Gatto
Recommendation systems provide recommendation based on information about users’ preferences. Information Filtering is used by recommendation systems so as information can be processed and suggested to users; and Content-Based Filtering is an Information Filtering approach very used in recommendation systems. Content-Based Filtering analyses the correlation of items content with the user’s profile, suggesting relevant items and putting away irrelevant items. Recommendation systems, which are very much used on the Internet, have been studied in order to be used on Digital TV context, and there already are several works in this sense. As they are used on the Internet, recommendation systems can be used in Digital TV in order to recommend TV programs, publicity and advertisement and also the electronic commerce. Thus, within Digital TV context, the items can be programs, advertisements and the products to be sold; and using Content-Based Filtering in the recommendation programs, for instance, these programs’ contents can be correlated with the user’s preferences, which in this scenario, are the type of program one wants to watch. This paper presents the studies accomplished with Content-Based Filtering with application on Digital TV data. The survey aims at observing and evaluating how some filtering techniques based on content can be used in recommendation systems in Digital TV context
T Silva, D D Karunaratna, G N Wikramanayake, K P Hewagamage, G K A Dias (2004) "Speaker Search and Indexing for Multimedia Databases" In:6th International Information Technology Conference, Edited by:V.K. Samaranayake et al. pp. 157-162. Infotel Lanka Society, Colombo, Sri Lanka: IITC Nov 29-Dec 1, ISBN: 955-8974-01-3
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
발표자: 김준태 (KAIST 박사과정)
발표일: 2018.10
Voice activity detection (VAD) and speech enhancement (SE) are important front-end technologies for noise robust speech recognition system.
From incoming noisy signal, VAD detects the speech signal only and SE removes the noise signal while conserving the speech signal.
For VAD and SE, this presentation will cover the traditional methods, deep learning based methods, and our papers as follows:
1. J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
2. J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Also, this presentation will briefly introduce some experimental results in real-world environment (far-field, noisy environment), conducted on the embedded board.
For VAD,
Traditional VAD methods.
Deep learning based VAD methods.
Paper presentation: J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
End point detection based on VAD.
Experimental results of DNN-EPD on embedded board in real-world environment.
For SE,
Traditional SE methods.
Deep learning based SE methods.
Paper presentation: J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Experimental results in real-world environment.
The paper presents a k-means based semi-supervised clustering approach for
recognizing and classifying P300 signals for BCI Speller System. P300 signals are proved to
be the most suitable Event Related Potential (ERP) signal, used to develop the BCI systems.
Due to non-stationary nature of ERP signals, the wavelet transform is the best analysis tool
for extracting informative features from P300 signals. The focus of the research is on semi-
supervised clustering as supervised clustering approach need large amount of labeled data
for training, which is a tedious task. Hence works for small-labeled datasets to train
classifiers. On the other hand, unsupervised clustering works when no prior information is
available i.e. totally unlabeled data. Thus leads to low level of performance. The in-between
solution is to use semi-supervised clustering, which uses a few labeled with large unlabeled
data causes less trouble and time. The authors have selected and defined adhoc features and
assumed the Clusters for small datasets. This motivates us to propose a novel approach that
discovers the features embedded in P300 (EEG) signals, using an k-means based semi-
supervised cluster classification using ensemble SVM
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text-dependent speaker identification system. It contains three main modules: endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short-Term Energy analysis is used for endpoint detection. Mel-frequency Cepstrum Coefficients (MFCC) is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization (VQ) approach using Linde, Buzo and Gray (LBG) clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung"A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18274.pdf http://www.ijtsrd.com/other-scientific-research-area/other/18274/a-robust-speaker-identification-system/zaw-win-aung
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
This paper describes the development of an efficient speech recognition system using different techniques such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ) and Hidden Markov Model (HMM).
This paper explains how speaker recognition followed by speech recognition is used to recognize the speech faster, efficiently and accurately. MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. Then HMM is used on Quantized feature vectors to identify the word by evaluating the maximum log likelihood values
for the spoken word.
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
Speaker identification is critical for a variety of voice based applications in safety and surveillance systems, and these kinds of methods are now employed in household appliances for user controlled device toggling. A comprehensive and language uncertain Voice Activity Detection VAD system is critical for Digital Media Content DMC . VAD systems are utilised for DMC generation in a variety of methods, including supplementing subtitle formation, detecting and correcting subtitle drifting, and sound distortion. The goal of this article is to provide a comprehensive overview of numerous strategies utilised for voice recognition in the entertainment industry. An analysis of several speaker recognition strategies used earlier and those utilised in current studies was explored, and a clear understanding of the superior methodology was discovered through a survey across various literature for more than two decades. We give a comprehensive survey of DNN based VADs using DMC data concerning accuracy, noise sensitivity, and language agnostic performance in this paper. Nikhil Kumar | Sumit Dalal "A Study of Digital Media-Based Voice Activity Detection Protocols" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd56376.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/56376/a-study-of-digital-mediabased-voice-activity-detection-protocols/nikhil-kumar
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
A novel text-independent speaker identification system based on the Zak transform is implemented. The data used in this paper are drawn from the ELSDSR database. The efficiency of identification approaches 91.3% using single test file and 100% using two test files. The method shows comparable efficiency results with the well known MFCC method with an advantage of being faster in both modeling and identification.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
In this article, Bhusan Chettri provides an overview of person authentication system that is based on automatic speaker verification technology. He provides background on both the traditional approaches of modelling speakers and current deep learning based approaches. A brief introduction to how these systems can be manipulated is also provided.
An overview of speaker recognition by Bhusan Chettri.pdfBhusan Chettri
In this article, Bhusan Chettri provides an overview of voice authentication system that is based on automatic speaker verification technology. He provides background on both the traditional approaches of modelling speakers and current deep learning based approaches. A brief introduction to how these systems can be manipulated is also provided.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
Abstract : This paper discusses the methodology for a project named “Speech Recognized Automation System using Speaker Identification through wireless communication”. This project gives the design of Automation system using wireless communication and speaker recognition using Matlab code. Straightforward programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless automation system which is built and implemented. The speech recognition centers on recognition of speech commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech processor is used to control the appliances without speaker identification. Keywords — Automation system, MATLAB code, MFCC, speaker identification, ZigBee transceiver.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
Similar to A novel automatic voice recognition system based on text-independent in a noisy environment (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3643 - 3650
3644
recognition and adverse acoustic conditions. The researcher established that the accuracy of the MSR identity
toolbox is significantly reduced when used in a noisy environment, the author documented the implication of
noise and reverberation in the research. In particular, a regression formula has been established to predict
the recognition accuracy of a typical speech recognition system. On the other hand, Dirceu presents an
evaluation of the Microsoft Research Identity Toolbox under conditions reflecting those of a real-world
condition which means noisy environment, and he established that the accuracy of the MSR identity toolbox
drops when tested in a noisy environment [6-9]. However, both researchers did not provide solutions to this
problem. Hence, the research is conducted to improve the accuracy of (MSR) identity toolbox in a noisy
environment. The research attempts to address the issues in the domain of voice recognition in a noisy
environment, by:
Using several noise removal (de-noising) methods to improve the MSR identity toolbox;
Determining the best MFCC that suits both voice recognition and noise removal, since multiple types of
MFCC voice features may be used.
2. RESEARCH OBJECTIVES
This research objectives: (a) To improve the Microsoft research team’s MSR identity toolbox and
making it robust in a noisy environment. A method based on noise removal filtering and the choice of
the best MFCC speech signal feature that gives the best accuracy for voice recognition in a noisy
environment is proposed. Firstly, we implemented noise removal filters such as Filter bank, least mean
squares filter, Wavelet filter, and hearing aid filter in order to improve the accuracy of the MSR identity
toolbox in real-life conditions (noisy environment). Subsequently, we investigated several types of
the MFCC voice features to show their effects on the accuracy of the voice recognition system, and to
determine the best MFCC that suits both UBM-GMM voice recognition model and the noise removal
methods for better accuracy. (b) To develop the modules of the proposed voice recognition system using
MSR identity toolbox, which implements the GMM-UBM model. Several modules will be implemented in
MATLAB, such as Users enrolment, management, users' voice recording, a module for training and testing,
a module for noise application and removal, MFCC features extraction module, and a module for wav files
conversion in the case of ussage of TIMIT voice database. (c) To evaluate the feasibility of the proposed
method in improving the voice detection accuracy of the MSR identity toolbox in a noisy environment.
3. RELATED WORKS OF VOICE RECOGNITION
The first research work for AVR was done in 1985, it uses: Vector Quantization (VQ) [10-12] from
the enlistment data of a given voice, a partitioning of the acoustic space into a finite number of regions and
each represents a centroid vector. The set of these vectors forms what are called a quantization dictionary.
The proximity measurement between this dictionary and the frames of a test segment, calculated as
the average of the minimum distances between each frame and the centroids, allows the comparison of voice
segments. This method can be described as deterministic, but also vectorial and non-parametric. Its results
depend strongly on the fixed size of the dictionary. Core approaches, such as Support Vector Machine
SVM [13-16] apply to frame a nonlinear transformation to a large superspace. This method is to maximize
the linear margin between classes. The performance of SVM-based systems now competes with those of
the state of the art Gaussian mixture that we will present later. This method is qualified as semi-deterministic
(the transformation may or may not derive from theoretical laws), vector and discriminant. Anchor
models [17] represent a voice relative to a model space, in the form of a vector of similarities with respect to
them. A similarity can be calculated by a comparison score (anchor model VS voice learning data).
This method is strongly dependent on the robustness of the anchoring models. It depends on their
ability to synthesize the structural information about the acoustic space by a discrete overlap. Other methods
have been presented for about twenty years, as well as variants of the previous methods. They are generative
or discriminative "oriented": Dynamic Bayesian Networks (DBN) [18-19], Segment Mixture Models
(SMM) [20] combining GMM approaches (which we propose below) and Dynamic Time Warping
(DTW) [21] to take into account the temporal structure of the voice signal. The GDW (Gaussian Dynamic
Warping) approach of [22], which is a hybrid exploiting the generalization of generative approaches and
the structural modeling of DTW. VQ codebook model is developed based on K-means clustering procedure
for these perceptual features. In this algorithm there is a mapping from L training vectors into M clusters.
Each block is normalized to unit magnitude before giving as input to the model. One model is created for
each human voice [23]. In the family of discriminating linear classifiers, [24] introduced the classifier by
logistic regression, the latter having the advantage of relying on probabilistic and Bayesian hypotheses.
However, its performance does not match those of the SVM in the experiments of [24]. Also noteworthy is
3. Int J Elec & Comp Eng ISSN: 2088-8708
A novel automatic voice recognition system based on… (Motaz Hamza)
3645
a recent discriminatory linear classifier approach based on SVM, Pairwise SVMs [25], which goes beyond
the limits of SVMS based on the supervisors. Modeling based on GMM is part of the more general Hidden
Markov Models (HMM) model. Hidden Markov models apply perfectly to text-dependent mode, obtaining
excellent results. On the other hand, the use of HMM models in text-independent mode does not improve
the performance achieved by simpler models based on GMM [26]. These use the notions of state introduced
by the Russian mathematician. Their succession, with transition probabilities, and makes it possible to
elaborate a stochastic model of the voice. This approach, initially used for voice recognition, is very effective
for voice recognition. According to the literature review paper done by [2] GMM based AVR is
the best method because it is excellent for voice pattern recognition problem. In addition [27] developed
a GMM based automatic voice verification system development for forensics in Bahasa Indonesia.
Furthermore [28, 29] have proved from their research work that GMM is the best for voice recognition, then
followed by DTW and finally the less performing algorithm, namely SVM. As shown in Figure 1,
the researchers compared GMM, DTW and SVM using the TIMIT voice corpus. They increased the number
of voices from 10 to 50.
Figure 1. Performance comparisons of GMM, DTW, and SVM
4. RESEARCH METHOD
Voice recognition systems aim to verify and identify unknown voice signal with specific voice
signals in the database. Voice recognition systems are classified into two types of services [30]:
a. Voice verification confirms the identity claim
b. Voice identification determines of registered voices
Our purposed system is the Voice Verification System (VVS) and it is divided into three key components:
feature extraction, voice modeling, and decision.
4.1. Feature extraction of voice signals
Mel Frequency Cepstral Coefficients (MFCCs) is a type of features that describes the characteristics
of voice signal, its working is similar to the simulation of the human auditory system that has proved its
effectiveness in noise environment; it operates more accurately in frequency domain than the time domain.
The MFCC consists of six stages as shown in Figure 2. The first stage in MFCC is the pre-emphasis where
increase frequency of voice signal to restore the original voice signal at low levels of noise. The second stage
is the frame blocking that divides voice signal into many frames that are used to keep the voice signal for
a longer time. The third stage is the hamming window which each frame is multiplied to keep continuing and
reduce truncated of voice signal at the first and last points in the frame. The fourth stage is the Fourier
transform that used to convert the frame from time domain to frequency domain. The fifth stage is the Mel
scale filter that used to obtain high frequency and minimize unwanted frequency. The last stage is the log
energy computation, then followed by discrete cosine transform that is applied to return to the time
domain [1].
4.2. Voice modeling
The voice modeling step exploits the data provided in the feature extraction step in order to create
the representation of an individual who will subsequently be used to authenticate it. The model used is
usually a statistical representation of the acquired data. The voice model depends also on the quality of
microphones, expected channels of voice signal, amount of voice data duration enrollment and detection.
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3643 - 3650
3646
In this paper, we have chosen the stochastic volatility model such as (GMM-UBM) which is simple, easy to
evaluate, and faster to compute classification that is used for human voice verification of the text-independent
in noisy environment. As shown in Figures 3 and 4 the UBM model is first created by using Expectation
Maximization algorithm (EM) then it is used as feedback to create the GMM model through Maximum A
Posteriori algorithm (MAP). Then UBM-GMM models are used in order to recognize the unknown voice
during the testing process using the log-likelihood ratio and error equal rate.
Figure 2. Extraction of parameters with MFCC
Figure 3. General classifier structure for VVS
Figure 4. Voice verification system framework
4.3. Decision and performance measures
It consists of checking the adequacy of the vocal message with the acoustic reference of the voice it
pretends to be. It is a decision in all or nothing. The performance of voice verification is given in terms of
the accuracy rate that defines as a percentage of true voices accepted during the system test.
5. RESULTS AND ANALYSIS
5.1. Data collection
In the data collection stage, we used the TIMIT database that contains 630 (192 female and 438
male) human voice records with a sampling frequency. Each user has recorded 10 sentences, the voice record
duration of each sentence ranged between two to five seconds. We selected 530 human voices randomly as
background data and the remaining 100 human voices for users’ enrollment data. We have also selected
5. Int J Elec & Comp Eng ISSN: 2088-8708
A novel automatic voice recognition system based on… (Motaz Hamza)
3647
randomly 9 out of 10 sentences per human voice and the remaining sentence is kept in target human voice
data. Then we collected the data for 50 human voices randomly from the users’ enrollment data for
system testing.
5.2. Voice verification system consists of two phases: training and testing
a. Train phase: After data collection, we have created (MFCCs) for each human voice using Hcopy, it is
a tool of HTK to convert the voice signal to different types of MFCCs. In order to calculate MFCCs we
have selected the configuration file that we have taken from HTK-MFCC-MATLAB Toolbox to use it
with Hcopy tool. Then we trained background data to calculate UBM model by using the algorithm of
the MSR identity toolbox, namely Expectation Maximization (EM). Then we applied the Training user
enrollment data to calculate GMM model by using the algorithm of the MSR identity toolbox is namely
Maximum A Posteriori (MAP).
b. Test phase: To verify the voice, we added background noise. Then we reduced the noise by using noise
filter in order to get an enhanced signal. After that we extracted (MFCCs) feature of the enhanced signal
in the noisy environment by using Hcopy tool. Finally, we matched MFCCs feature of enhanced signal
with another features of human voices that existed in the users enrollment data by using GMM-UBM
trained models with algorithms of MSR identity toolbox, namely Log Likelihood Ratio (LLR) and Error
Equal Rate (EER). In this section, all of our experiment results have been done in MATLAB v 8.3.
The results have shown the system under perfect conditions (no noise) accuracy was nearly 100% when
tested with 50 human voices. Then noise and de-noise methods were applied in order to determine
the best configuration of MFCC features that better suits the MSR identity toolbox in order to achieve
the best accuracy possible. To do this we changed the MFCC configurations of Hcopy that are configured
to automatically convert its wave file input into another types of MFCC. Subsequently, we applied two
types of noise namely Background noise and White Gaussian noise at different signal to noise ratios
(SNR) (-30 to 60 db), and followed by four different de-noising methods - Filter bank, Least mean
squares filter (LMS), Wavelet filter, and Digital hearing aid filter. The implement of the Digital hearing
aid filter was the most complicated with many filter types and different steps; therefore, it seems that it is
not the most suitable for our system as all its results recognition were zero. From the results listed in
the Tables 1 and 2 we notice that Filter bank when choose MFCC_0_D_A, the results of average
accuracy at SNR (-30 to 60 db) were: 54.875% for Background noise as well as 14.750% for White
Gaussian noise. The results of the other filters are listed in the following tables, then comparison of our
proposed system, and with the results of the work of [23] in the clean and noisy environment.
Table 1. Using bankground noise to 50 voices at snr (-30 to 60 db)
Feature type/Filters No de-noising Filter bank Wavelet filter LMS filter
MFCC_0 42.500% 52.750% 20.750% 11.125%
MFCC_0_D 51.000% 54.250% 38.750% 35.250%
MFCC_0_D_A 51.310% 54.875% 41.500% 46.250%
MFCC_0_ D_A_Z 47.875% 48.000% 38.125% 48.750%
MFCC_0_D_Z 46.125% 46.375% 33.250% 48.375%
MFCC_0_Z 42.75% 43.000% 7.1875% 44.25%
MFCC_E 51.625% 52.375% 37.000% 10.250%
MFCC_E_D 53.125% 53.250% 41.375% 36.500%
MFCC_E_D_A 52.750% 52.875% 42.625% 46.875%
MFCC_E_D_A_Z 48.375% 48.625% 40.375% 49.125%
MFCC_E_D_Z 47.625% 48.125% 39.875% 48.625%
MFCC_E_Z 45.250% 45.375% 33.375% 46.500%
MFCC_Z 50.000% 50.375% 32.625% 49.875%
Table 2. Using White Gaussian noise to 50 voices at snr (-30 to 60 db)
Feature type/Filters No de-noising Filter bank Wavelet filter LMS filter
MFCC_0 5.000% 9.750% 2.870% 3.870%
MFCC_0_D 8.750% 13.000% 3.000% 8.125%
MFCC_0_D_A 10.500% 14.750% 2.620% 11.000%
MFCC_0_ D_A_Z 10.250% 10.375% 3.250% 11.500%
MFCC_0_D_Z 10.000% 10.000% 2.750% 11.625%
MFCC_0_Z 9.000% 9.000% 0.375% 9.125%
MFCC_E 11.250% 11.375% 3.125% 3.500%
MFCC_E_D 13.000% 13.125% 5.000% 10.810%
MFCC_E_D_A 14.125% 14.250% 6.750% 13.250%
MFCC_E_D_A_Z 10.250% 10.375% 5.125% 12.000%
MFCC_E_D_Z 10.250% 10.250% 5.250% 12.000%
MFCC_E_Z 9.375% 9.750% 3.375% 10.625%
MFCC_Z 50.000% 50.375% 32.625% 49.875%
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3643 - 3650
3648
The following Figures 5 and 6 show the performance de-noising algorithms under different MFCC
using Background noise as well as White Gaussian noise. From Table 3 we notice the result of our proposed
system GMM-UBM in clean environment was 100% for 50 voices which is the better result compared to VQ
codebook system. The noise deteriorates the information if the voice signals without a noise removal
the system accuracy will automatically go down. Therefore, the accuracy of VQ system [23] all will drop in
a noisy environment at SNR 15db as shown in the Table 4.
Figure 5. De-noising algorithms performance under different MFCC using background noise
Figure 6. De-noising algorithms performance under different MFCC using White Gaussian noise
Table 3. Comparison of our proposed system with the system of [23] in the clean environment
Reference Modeling method Voices database Number of voices to test Condition Accuracy rate in (%)
[23] VQ codebook TIMIT 50 Clean 91%
Our proposed system GMM-UBM TIMIT 50 Clean 100%
Table 4. Comparison of our proposed system with the system of [23] in the noisy environment at SNR 15db
Reference Modeling method Voices database Number of voices
to test
Condition Accuracy rate in (%)
[23] VQ codebook TIMIT 50 Background Noise
at 15db
< 91%
Our proposed
system
GMM-UBM TIMIT 50 Background Noise
at 15db
100%
From Table 5 we notice the objectives achievement of our proposed system are listed as follows:
We used MATLAB, implemented our methods and rapid application methodology to achieve our first
objective; which is: to design a voice recognition system using the MSR identity toolbox that uses
the GMM-UBM models with enhanced modules in MATLAB such as user voice recording.
To achieve our second objective, we used HTK MFCC configuration to investigate the feature extraction
process, and to figure out which configuration of MFCC features that better suits the MSR identity
toolbox.
7. Int J Elec & Comp Eng ISSN: 2088-8708
A novel automatic voice recognition system based on… (Motaz Hamza)
3649
To achieve the third objective, we implemented our voice recognition system in noisy environment based
on MSR identity toolbox, HTK and we implemented our noise removal method (filter bank), in order, to
achieve the best accuracy possible compared to VQ codebook system.
Table 5. Results achieved
Reference Model type Method Features
extraction
Recognition
rate in (%)
Performence Environments Noise Filter
[23] VQ
codebook
Iterative
Clustering
Approach
MF-PLP 91% Slow Clean None
Our
proposed
system
GMM-UBM MSR
identity
toolbox,
HTK, and
filters
noise
MFCC_0_D_A 100% Fast Clean &
Noisy
Filter bank
6. CONCLUSION
In this paper, we presented a robust system of voice recognition by implementing the codes
necessary to run the Microsoft research identity toolbox. Most importantly, we have improved the accuracy
of the Microsoft research identity toolbox where the result of GMM-UBM was better than VQ codebook in
the clean environment by 9%. We found that the result increased the accuracy of the MSR identity toolbox
by 3% by applying Filter bank using MFCC_0_D_A for voice recognition system in background noises as
well as white Gaussian noise. As a conclusion, our proposed system will contribute to making biometric
access through voice a real success in real-world application, as well as playing an important role in
the research involving voice recognition technology.
REFERENCES
[1] P. M. Chauhan and N. P. Desai, "Mel Frequency Cepstral Coefficients (MFCC) based Speaker Identification in
Noisy Environment using Wiener Filter," presented at Int. Conf. of Green Computing Communication and
Electrical Engineering, 2014.
[2] J. H. Hansen and T. Hasan, "Speaker Recognition by Machines and Humans: A Tutorial Review," IEEE Signal
Processing Magazine, vol. 32, no. 6, pp. 74-99, 2015.
[3] S. Memon, et al., "Effect of Clinical Depression on Automatic Speaker Identification," Presented at The 3rd Int.
Conf. On Bioinformatics And Biomedical Engineering, 2009.
[4] S. O. Sadjadi, M. Slaney and L. Heck, "MSR Identity Toolbox v1.0: A Matlab Toolbox for Speaker Recognition
Research," Speech Language Process Tech. Committee News, vol. 1, no 4. 2013.
[5] K. A. Al-Karawi, A. H. Al-Noori, F. F. Li and T. Ritchings, “Automatic Speaker Recognition System in Adverse
Conditions-Implication of Noise and Reverberation on System Performance,” International Journal of Information
and Electronics Engineering, pp. 5-6, 2015.
[6] D. G. da Silva and , "Evaluation of MSR Identity Toolbox under Conditions Reflecting Those of A Real Forensic
Case (forensic_eval_01)," Speech Communication, pp. 42-49, 2017.
[7] T. Khodadadi, et al, "Evaluation of Recognition-Based Graphical Password Schemes in Terms of Usability and
Security Attributes." International Journal of Electrical and Computer Engineering, vol. 6, no. 6, 2016: 2939.
[8] T. Khodadadi, et al., "Privacy Issues and Protection in Secure Data Outsourcing." Jurnal Teknologi,
vol. 69, no. 6, 2014.
[9] F. F. Moghaddam, et al., "VDCI: Variable Data Classification Index to Ensure Data Protection in Cloud Computing
Environments," IEEE Conference on Systems, Process and Control, 2014.
[10] F. K. Soong, et al., "Report: A Vector Quantization Approach to Speaker Recognition," AT&T Technical Journal,
vol. 66, no. 2, pp. 14-26, 1987.
[11] J. S. Mason, J. Oglesby and L. Xu, "Codebooks to Optimise Speaker Recognition," 1st European Conf. on Speech
Communication and Technology, pp. 1267-1270, 1989.
[12] P. Mishra, "A Vector Quantization Approach to Speaker Recognition," Proc. of the Int. Conf. on Innovation &
Research in Technology for Sustainable Development, pp. 152-155, 2012.
[13] V. N. Vapnik, "An Overview of Statistical Learning Theory," IEEE Transactions On Neural Networks, vol. 10,
no. 5, pp. 988-999, 1999.
[14] V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer Science & Business Media New York, 2000.
[15] V. Wan, and W. M. Campbell, "Support Vector Machines for Speaker Verification and Identification," Neural
Networks for Signal Processing X, Proceedings of the 2000 IEEE Signal Processing Society Workshop, 2000.
[16] S. Fine, J. Navratil, and R. A. Gopinath, "A Hybrid GMM/SVM Approach to Speaker Identification," Int. J. of
Electrical and Computer Engineering, vol. 1, no. 4, pp. 721-727, 2007
[17] T. Merlin, J. F. Bonastre, and C. Fredouille, "Non Directly Acoustic Process for Costless Speaker Recognition and
Indexation," Service Central d’Authentification du CCSD, 1999.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3643 - 3650
3650
[18] G. F. Cooper, and E. Herskovits, "A Bayesian Method for the Induction of Probabilistic Networks from Data.
Machine Learning," vol. 9, no. 4, pp. 309-347, 1992.
[19] D. Heckerman, "A Tutorial on Learning with Bayesian Networks," Learning in Graphical Models,
pp. 301-354, 1999.
[20] R. P.Stapert and J. S. Mason, "A Segmental Mixture Model for Speaker Recognition," 7th European Conf. on
Speech Communication and Technology, 2001.
[21] S. Furui, "Cepstral Analysis Technique for Automatic Speaker Verification," IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 29, no. 2, pp. 254-272, 1981.
[22] J. F. Bonastre, P. Morin and J.C. Junqua, "Gaussian Dynamic Warping (GDW) Method Applied to Text-Dependent
Speaker Detection and Verification," 8th European Conf. on Speech Commu. and Technology, pp. 1-4. 2003.
[23] A. Revathi, R. Ganapathy, and Y. Venkataramani. "Text Independent Speaker Recognition and Speaker
Independent Speech Recognition using Iterative Clustering Approach," Int. Journal of Computer Science &
Information Technology, vol. 1, no. 2, pp. 30-42, 2009.
[24] L. Burget, et al., "Discriminatively Trained Probabilistic Linear Discriminant Analysis for Speaker Verification,"
Int. Conf. Acoustics, Speech and Signal Processing, pp. 22-27. 2011.
[25] S. Cumani, et al., "Pairwise Discriminative Speaker Verification in the I- Vector Space," IEEE Transactions on
Audio, Speech, and Language Processing, vol. 21, no. 6, pp. 1217-1227, 2013.
[26] H. Zeinali, H. Sameti, and L. Burget, "Text-Dependent Speaker Verification based on i-vectors, Neural Networks
and Hidden Markov Models," Computer Speech & Language, vol. 46, pp. 53-71, 2017.
[27] I. Stefanus, R. J. Sarwono, and M. I. Mandasari, "GMM based Automatic Speaker Verification System
Development for Forensics in Bahasa Indonesia," 5th Int. Conf. On Instrumentation, Control And Auto, 2017.
[28] Yee, L. M, & Ahmad, A. M. “Comparative Study of Speaker Recognition Methods: DTW, GMM and SVM,” 2007.
[29] M. N. Kakade, and D. B. Salunke. "An Automatic Real Time Speech-Speaker Recognition System: A Real Time
Approach." ICCCE 2019, pp. 151-158, 2020.
[30] C. Yu, G. Liu, S. Hahm and J. H. Hansen, "Uncertainty Propagation in Front end Factor Analysis for Noise Robust
Speaker Recognition," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014.
BIOGRAPHIES OF AUTHORS
Motaz Hamza is currently doing his M.Sc. in information technology at Malaysia University of
Science and Technology (MUST), Malaysia. He received his B.Sc. in Electrical Engineering at
Al-Refaq University and High Diploma of Information Technology in Al-Shumoukh Institute in
Tripoli, Libya. His main research interests include authentication systems, networks security,
cryptography and cybersecurity.
Dr. Touraj Khodadadi is a faculty member of the school of science and engineering in Malaysia
University of Science and Technology (MUST). He received his Ph.D. and M.Sc. in Information
Security in UniversitiTeknologi Malaysia (UTM), Malaysia and B.Sc. in Software Engineering
from AIU, Iran. Touraj’s teaching interests include cyber and network security, authentication
systems, cloud computing, big data and machine learning. In addition, he authored and co-authored
30 international journals and conference papers concerning various aspects of computer,
information and network security. His main research interests include authentication systems,
wireless and wired network security, cryptography, graphical passwords, network authentication,
data mining and cloud computing. Apart from teaching and research activities at the university,
he has served as an editor and reviewer for several international journals and conferences. He is
also members of several review panels for master and doctoral research defense.
Dr. P. Sellappan is Professor of Information Technology and Dean of School of Science and
Engineering and Provost of the Malaysia University of Science and Technology. He holds a PhD
in Interdisciplinary Information Science from the University of Pittsburgh (USA), MSc in
Computer Science from the University of London (UK), and BEc in Statistics from the University
of Malaya. He has published numerous research papers in reputable international journals and
conference papers. He has also authored several IT textbooks for college and university students.
His current research interests include Data Mining, Machine Learning, Health Informatics,
Blockchain, Cloud Computing, IoT, Cybersecurity and Web Services.