This document proposes a state-of-the-art automatic speaker recognition system based on Bayesian distance metric learning as a feature extractor. It explores constraints on the distance between modified and simplified i-vector pairs from the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. This Bayesian distance learning approach achieves better performance than advanced methods and is insensitive to normalization compared to cosine scores. It is also effective with limited training data.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
Automatic voice recognition system aims to limit fraudulent access to sensitive areas as labs. Our primary objective of this paper is to increasethe accuracy of the voice recognition in noisy environment of the Microsoft Research (MSR) identity toolbox. The proposed system enabled the user tospeak into the microphone then it will match unknown voice with other human voices existing in the database using a statistical model, in order togrant or deny access to the system. The voice recognition was done in twosteps: training and testing. During the training a Universal BackgroundModel as well as a Gaussian Mixtures Model: GMM-UBM models arecalculated based on different sentences pronounced by the human voice (s) used to record the training data. Then the testing of voice signal in noisyenvironment calculated the Log-Likelihood Ratio of the GMM-UBM models in order to classify user's voice. However, before testing noise and de-noisemethods were applied, we investigated different MFCC features of the voiceto determine the best feature possible as well as noise filter algorithmthat subsequently improved the performance of the automatic voicerecognition system.
Speaker Identification From Youtube Obtained Datasipij
An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like
YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker
identification is to identify the number of different speakers and prepare a model for that speaker by
extraction, characterization and speaker-specific information contained in the speech signal. It has many
diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security ,
transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary.
The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is
detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian
mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to
examine and judge carefully speaker identification in text independent. The application or employment of
Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity,
awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics
of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious
densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some
arbitrary value in initial estimation and carry on the iterative process until the convergence of value is
observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization
and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82%
of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling
by Expectation maximization parameter estimation depending on variation of parameter.
Mobile Adhoc Network (MANET) is a self-configuring and infrastructure-less network which consists of mobile devices such as mobiles, laptops, PDA's etc. Because of its lack of infrastructure, wireless mobile communication, dynamic topology, MANET is vulnerable to various security attacks. This survey paper presents an overview of developments of voting and non-voting based certificate revocation mechanisms in past few years. Certificate revocation is an important method used to secure the MANET. Certificate revocation isolates the attacker nodes from participating in network activities by revoking its certificate. Over last few years different schemes are explored for certificate revocation. In concluding section we present the limitations of the current cluster based certificate revocation scheme.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
Automatic voice recognition system aims to limit fraudulent access to sensitive areas as labs. Our primary objective of this paper is to increasethe accuracy of the voice recognition in noisy environment of the Microsoft Research (MSR) identity toolbox. The proposed system enabled the user tospeak into the microphone then it will match unknown voice with other human voices existing in the database using a statistical model, in order togrant or deny access to the system. The voice recognition was done in twosteps: training and testing. During the training a Universal BackgroundModel as well as a Gaussian Mixtures Model: GMM-UBM models arecalculated based on different sentences pronounced by the human voice (s) used to record the training data. Then the testing of voice signal in noisyenvironment calculated the Log-Likelihood Ratio of the GMM-UBM models in order to classify user's voice. However, before testing noise and de-noisemethods were applied, we investigated different MFCC features of the voiceto determine the best feature possible as well as noise filter algorithmthat subsequently improved the performance of the automatic voicerecognition system.
Speaker Identification From Youtube Obtained Datasipij
An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like
YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker
identification is to identify the number of different speakers and prepare a model for that speaker by
extraction, characterization and speaker-specific information contained in the speech signal. It has many
diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security ,
transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary.
The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is
detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian
mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to
examine and judge carefully speaker identification in text independent. The application or employment of
Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity,
awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics
of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious
densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some
arbitrary value in initial estimation and carry on the iterative process until the convergence of value is
observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization
and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82%
of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling
by Expectation maximization parameter estimation depending on variation of parameter.
Mobile Adhoc Network (MANET) is a self-configuring and infrastructure-less network which consists of mobile devices such as mobiles, laptops, PDA's etc. Because of its lack of infrastructure, wireless mobile communication, dynamic topology, MANET is vulnerable to various security attacks. This survey paper presents an overview of developments of voting and non-voting based certificate revocation mechanisms in past few years. Certificate revocation is an important method used to secure the MANET. Certificate revocation isolates the attacker nodes from participating in network activities by revoking its certificate. Over last few years different schemes are explored for certificate revocation. In concluding section we present the limitations of the current cluster based certificate revocation scheme.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
Support Recovery with Sparsely Sampled Free Random Matrices for Wideband Cogn...IJMTST Journal
The main objective of this project is to design an eigenvalue-based compressive SOE technique using asymptotic random matrix theory. In this project, investigating blind sparsity order estimation (SOE) techniques is an open research issue. To address this, this project presents an eigenvalue-based compressive SOE technique using asymptotic random matrix theory. Finally, this project propose a technique to estimate the sparsity order of the wideband spectrum with compressive measurements using the maximum eigenvalue of the measured signal's covariance matrix. .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Parallel and Distributed System IEEE 2014 ProjectsVijay Karan
List of Parallel and Distributed System IEEE 2014 Projects. It Contains the IEEE Projects in the Domain Parallel and Distributed System for the year 2014
List of Parallel and Distributed System IEEE 2014 Projects. It Contains the IEEE Projects in the Domain Parallel and Distributed System for the year 2014
Collaborative spectrum sensing (CSS) was visualize to improve the reliability of spectrum sensing in centralized cognitive radio networks (CRNs). A popular attack in Collaborative Spectrum Sensing is the called spectrum sensing data falsification (SSDF) attack. There will be a punishment strategy which is present to see the reputation method, in which the honour factor and the retribution factor are introduced to give SUs to given in positive and honest sensing activities. There will be a punishment strategy which is present to see the reputation method, in which the honour factor and the retribution factor are introduced to give SUs to given in positive and honest sensing activities. Harvesting energy from ubiquitous radio frequency (RF) signals in urban area is environmentally friendly and self-sustaining. Here Proposed a threshold-based framework for optimal spectral access strategy and show that the threshold is optimal and traffic-dependent. The proposed threshold-based strategy takes into account both the spectral access and energy harvesting opportunities provided by a particular traffic application. Also an iterative algorithm is used that selects a threshold which maximizes the SU transmission opportunity subject to the overall harvested energy budget. Further, we illustrate the effects of different Harvesting energy for the Primary users and the illerate algorithm is used here.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Deep neural networks have shown recent promise in many language-related tasks such as the modelling of
conversations. We extend RNN-based sequence to sequence models to capture the long-range discourse
across many turns of conversation. We perform a sensitivity analysis on how much additional context
affects performance, and provide quantitative and qualitative evidence that these models can capture
discourse relationships across multiple utterances. Our results show how adding an additional RNN layer
for modelling discourse improves the quality of output utterances and providing more of the previous
conversation as input also improves performance. By searching the generated outputs for specific
discourse markers, we show how neural discourse models can exhibit increased coherence and cohesion in
conversations.
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...IJNSA Journal
The transmission of voice over IP networks can generate network congestion due to weak supervision of the traffic incoming packet, queuing and scheduling. This congestion negatively affects the Quality of Service (QoS) such as delay, packet drop and packet loss. Packet delay effects will affect the other QoS such as: unstable voice packet delivery, packet jitter, packet loss and echo. Priority Queuing (PQ) algorithm is a more popular technique used in the VoIP network to reduce delays. In operation, the PQ is to use the method of sorting algorithms, search and route planning to classify packets on the router. Thus, this packet classifying method can result in repetition of the process. And this recursive loop leads to the next queue starved. In this paper, to solving problems, there are three phases namely queuing phase, classifying phase and scheduling phase. The PQ algorithm technique is based on the priority. It will be applied to the fuzzy inference system to classify the queuing incoming packet (voice, video and text); that can reduce recursive loop and starvation. After the incoming packet is classified, the packet will be sent to the packet buffering. In addition, to justify the research objective of the PQ improved algorithm will be compared against the algorithm existing PQ, which is found in the literature using metrics such as delay, packets drop and packet losses. This paper described about different execution long process in Priority (PQ) and our algorithm. Our Algorithm is to simplify process execution Algorithm that can cause starvation occurs in PQ algorithm.
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...IDES Editor
Routing in mobile networks is a multiobjective
optimization problem. The problem needs to consider multiple
objectives simultaneously such as Quality of Service
parameters, delay and cost. This paper uses the NSGA-II
multiobjectve genetic algorithm to solve the dynamic shortest
path routing problem in mobile networks and proposes a
framework for real-time software implementation.
Simulations confirm a good quality of solution (route
optimality) and a high rate of convergence.
Deep neural networks have shown recent promise in many language-related tasks such as the modelling of conversations. We extend RNN-based sequence to sequence models to capture the long-range discourse across many turns of conversation. We perform a sensitivity analysis on how much additional context affects performance, and provide quantitative and qualitative evidence that these models can capture discourse relationships across multiple utterances. Our results show how adding an additional RNN layer for modelling discourse improves the quality of output utterances and providing more of the previous conversation as input also improves performance. By searching the generated outputs for specific discourse markers, we show how neural discourse models can exhibit increased coherence and cohesion in conversations.
Neural Network Algorithm for Radar Signal RecognitionIJERA Editor
Nowadays, the traditional recognition method could not match the development of radar signals. In this paper, based on fractal theory and Neural Network, a new radar signal recognition algorithm is presented. The relevant point is extracted as the input of neutral network, and then it will recognize and classify the signals. Simulation results show that, this algorithm has a distinguish effect on classification under the condition of low SNR.
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Speech recognition is the next big step that the technology needs to take for general users. An Automatic Speech Recognition (ASR) will play a major role in focusing new technology to users. Applications of ASR are speech to text conversion, voice input in aircraft, data entry, voice user interfaces such as voice dialing. Speech recognition involves extracting features from the input signal and classifying them to classes using pattern matching model. This can be done using feature extraction method. This paper involves a general study of automatic speech recognition and various methods to generate an ASR system. General techniques that can be used to implement an ASR includes artificial neural networks, Hidden Markov model, acoustic –phonetic approach
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
Speaker identification is critical for a variety of voice based applications in safety and surveillance systems, and these kinds of methods are now employed in household appliances for user controlled device toggling. A comprehensive and language uncertain Voice Activity Detection VAD system is critical for Digital Media Content DMC . VAD systems are utilised for DMC generation in a variety of methods, including supplementing subtitle formation, detecting and correcting subtitle drifting, and sound distortion. The goal of this article is to provide a comprehensive overview of numerous strategies utilised for voice recognition in the entertainment industry. An analysis of several speaker recognition strategies used earlier and those utilised in current studies was explored, and a clear understanding of the superior methodology was discovered through a survey across various literature for more than two decades. We give a comprehensive survey of DNN based VADs using DMC data concerning accuracy, noise sensitivity, and language agnostic performance in this paper. Nikhil Kumar | Sumit Dalal "A Study of Digital Media-Based Voice Activity Detection Protocols" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd56376.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/56376/a-study-of-digital-mediabased-voice-activity-detection-protocols/nikhil-kumar
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
Support Recovery with Sparsely Sampled Free Random Matrices for Wideband Cogn...IJMTST Journal
The main objective of this project is to design an eigenvalue-based compressive SOE technique using asymptotic random matrix theory. In this project, investigating blind sparsity order estimation (SOE) techniques is an open research issue. To address this, this project presents an eigenvalue-based compressive SOE technique using asymptotic random matrix theory. Finally, this project propose a technique to estimate the sparsity order of the wideband spectrum with compressive measurements using the maximum eigenvalue of the measured signal's covariance matrix. .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
Students in this modern age find engineering courses taught in the university very abstract and difficult, and cannot relate theoretical calculations to real life scenarios. They consequently lose interest in their coursework and perform poorly in their grades. Simulation of classroom concepts with simulation software like MATLAB, were developed to facilitate learning experience. This paper involves the development of a virtual laboratory simulation package for teaching data communication concepts such as coding schemes, modulation and filtering. Unlike other simulation packages, no prior knowledge of computer programming is required for students to grasp these concepts.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Parallel and Distributed System IEEE 2014 ProjectsVijay Karan
List of Parallel and Distributed System IEEE 2014 Projects. It Contains the IEEE Projects in the Domain Parallel and Distributed System for the year 2014
List of Parallel and Distributed System IEEE 2014 Projects. It Contains the IEEE Projects in the Domain Parallel and Distributed System for the year 2014
Collaborative spectrum sensing (CSS) was visualize to improve the reliability of spectrum sensing in centralized cognitive radio networks (CRNs). A popular attack in Collaborative Spectrum Sensing is the called spectrum sensing data falsification (SSDF) attack. There will be a punishment strategy which is present to see the reputation method, in which the honour factor and the retribution factor are introduced to give SUs to given in positive and honest sensing activities. There will be a punishment strategy which is present to see the reputation method, in which the honour factor and the retribution factor are introduced to give SUs to given in positive and honest sensing activities. Harvesting energy from ubiquitous radio frequency (RF) signals in urban area is environmentally friendly and self-sustaining. Here Proposed a threshold-based framework for optimal spectral access strategy and show that the threshold is optimal and traffic-dependent. The proposed threshold-based strategy takes into account both the spectral access and energy harvesting opportunities provided by a particular traffic application. Also an iterative algorithm is used that selects a threshold which maximizes the SU transmission opportunity subject to the overall harvested energy budget. Further, we illustrate the effects of different Harvesting energy for the Primary users and the illerate algorithm is used here.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Deep neural networks have shown recent promise in many language-related tasks such as the modelling of
conversations. We extend RNN-based sequence to sequence models to capture the long-range discourse
across many turns of conversation. We perform a sensitivity analysis on how much additional context
affects performance, and provide quantitative and qualitative evidence that these models can capture
discourse relationships across multiple utterances. Our results show how adding an additional RNN layer
for modelling discourse improves the quality of output utterances and providing more of the previous
conversation as input also improves performance. By searching the generated outputs for specific
discourse markers, we show how neural discourse models can exhibit increased coherence and cohesion in
conversations.
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...IJNSA Journal
The transmission of voice over IP networks can generate network congestion due to weak supervision of the traffic incoming packet, queuing and scheduling. This congestion negatively affects the Quality of Service (QoS) such as delay, packet drop and packet loss. Packet delay effects will affect the other QoS such as: unstable voice packet delivery, packet jitter, packet loss and echo. Priority Queuing (PQ) algorithm is a more popular technique used in the VoIP network to reduce delays. In operation, the PQ is to use the method of sorting algorithms, search and route planning to classify packets on the router. Thus, this packet classifying method can result in repetition of the process. And this recursive loop leads to the next queue starved. In this paper, to solving problems, there are three phases namely queuing phase, classifying phase and scheduling phase. The PQ algorithm technique is based on the priority. It will be applied to the fuzzy inference system to classify the queuing incoming packet (voice, video and text); that can reduce recursive loop and starvation. After the incoming packet is classified, the packet will be sent to the packet buffering. In addition, to justify the research objective of the PQ improved algorithm will be compared against the algorithm existing PQ, which is found in the literature using metrics such as delay, packets drop and packet losses. This paper described about different execution long process in Priority (PQ) and our algorithm. Our Algorithm is to simplify process execution Algorithm that can cause starvation occurs in PQ algorithm.
A Real Time Framework of Multiobjective Genetic Algorithm for Routing in Mobi...IDES Editor
Routing in mobile networks is a multiobjective
optimization problem. The problem needs to consider multiple
objectives simultaneously such as Quality of Service
parameters, delay and cost. This paper uses the NSGA-II
multiobjectve genetic algorithm to solve the dynamic shortest
path routing problem in mobile networks and proposes a
framework for real-time software implementation.
Simulations confirm a good quality of solution (route
optimality) and a high rate of convergence.
Deep neural networks have shown recent promise in many language-related tasks such as the modelling of conversations. We extend RNN-based sequence to sequence models to capture the long-range discourse across many turns of conversation. We perform a sensitivity analysis on how much additional context affects performance, and provide quantitative and qualitative evidence that these models can capture discourse relationships across multiple utterances. Our results show how adding an additional RNN layer for modelling discourse improves the quality of output utterances and providing more of the previous conversation as input also improves performance. By searching the generated outputs for specific discourse markers, we show how neural discourse models can exhibit increased coherence and cohesion in conversations.
Neural Network Algorithm for Radar Signal RecognitionIJERA Editor
Nowadays, the traditional recognition method could not match the development of radar signals. In this paper, based on fractal theory and Neural Network, a new radar signal recognition algorithm is presented. The relevant point is extracted as the input of neutral network, and then it will recognize and classify the signals. Simulation results show that, this algorithm has a distinguish effect on classification under the condition of low SNR.
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively.
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Speech recognition is the next big step that the technology needs to take for general users. An Automatic Speech Recognition (ASR) will play a major role in focusing new technology to users. Applications of ASR are speech to text conversion, voice input in aircraft, data entry, voice user interfaces such as voice dialing. Speech recognition involves extracting features from the input signal and classifying them to classes using pattern matching model. This can be done using feature extraction method. This paper involves a general study of automatic speech recognition and various methods to generate an ASR system. General techniques that can be used to implement an ASR includes artificial neural networks, Hidden Markov model, acoustic –phonetic approach
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
Speaker identification is critical for a variety of voice based applications in safety and surveillance systems, and these kinds of methods are now employed in household appliances for user controlled device toggling. A comprehensive and language uncertain Voice Activity Detection VAD system is critical for Digital Media Content DMC . VAD systems are utilised for DMC generation in a variety of methods, including supplementing subtitle formation, detecting and correcting subtitle drifting, and sound distortion. The goal of this article is to provide a comprehensive overview of numerous strategies utilised for voice recognition in the entertainment industry. An analysis of several speaker recognition strategies used earlier and those utilised in current studies was explored, and a clear understanding of the superior methodology was discovered through a survey across various literature for more than two decades. We give a comprehensive survey of DNN based VADs using DMC data concerning accuracy, noise sensitivity, and language agnostic performance in this paper. Nikhil Kumar | Sumit Dalal "A Study of Digital Media-Based Voice Activity Detection Protocols" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd56376.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/56376/a-study-of-digital-mediabased-voice-activity-detection-protocols/nikhil-kumar
The effect of gamma value on support vector machine performance with differen...IJECEIAES
Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...ijistjournal
Spread-spectrum communication, with its inherent interference attenuation capability, has over the years become an increasingly popular technique for use in many different systems. They have very beneficial and tempting features, like Antijam, Security, and Multiple accesses. This thesis basically deals with the pseudo codes used in spread spectrum communication system. The cross-correlation and auto-correlation properties of the long Barker Code are analyzed. It has been seen that the length of the code, autocorrelation and cross-correlation properties can help us to determine the best suitable code for any particular communication environment. We have tried to find out the code with suitable auto-correlation properties along with low cross-correlation values. Barker code has good auto-correlation properties and we have found the pairs with the low cross- correlation so that they can be used in multi-user environment.
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...ijistjournal
Spread-spectrum communication, with its inherent interference attenuation capability, has over the years become an increasingly popular technique for use in many different systems. They have very beneficial and tempting features, like Antijam, Security, and Multiple accesses. This thesis basically deals with the pseudo codes used in spread spectrum communication system. The cross-correlation and auto-correlation properties of the long Barker Code are analyzed. It has been seen that the length of the code, autocorrelation and cross-correlation properties can help us to determine the best suitable code for any particular communication environment. We have tried to find out the code with suitable auto-correlation properties along with low cross-correlation values. Barker code has good auto-correlation properties and we have found the pairs with the low cross- correlation so that they can be used in multi-user environment.
Classification improvement of spoken arabic language based on radial basis fu...IJECEIAES
The important task in the computer interaction is the languages recognition and classification. In the Arab world, there is a persistent need for the Arabic spoken language recognition To help those who have lost the upper parties in doing what they want through speech computer interaction. While, the Arabic automatic speech recognition (AASR) did not receive the desired attention from the researchers. In this paper, the Radial Basis Function(RBF) is used for the improvement of the Arabic spoken language letter. The recognition and classification process are based on three steps; these are; preprocessing, feature extraction and classification (Recognition). The Arabic Language Letters (ALL) recognition is done by using the combination between the statistical features and the Temporal Radial Basis Function for different letter situation and noisy condition. The recognition percent are from 90% - 99.375% has been gained with independent speaker, where these results are over-perform the earlier works by nearly 2.045%. The simulati.on has been made by using Matlab 2015b.
A computationally efficient learning model to classify audio signal attributesIJECEIAES
The era of machine learning has opened up groundbreaking realities and opportunities in the field of medical diagnosis. However, it is also observed that faster and proper diagnosis of any diseases/medical conditions require proper analysis and classification of digital signal data. It indicates the proper identification of tumors in the brain. Brain magnetic resonance imaging (MRI) data has to be appropriately classified, and similarly, pulse signal analysis is required to evaluate the human heart operating condition. Several studies have used machine learning (ML) modeling to classify speech signals, but very few studies have explored the classification of audio signal attributes in the context of intelligent healthcare monitoring. The study thereby aims to introduce novel mathematical modeling to analyze and classify synthetic pulse audio signal attributes with cost-effective computation. The numerical modeling is composed of several functional blocks where deep neural network-based learning (DNNL) plays a crucial role during the training phase, and also it is further combined with a recurrent structure of long-short term memory (R-LSTM) feedback connections (FCs). The design approaches further experiment in a numerical computing environment in terms of accuracy and computational aspects. The classification outcome of the proposed approach shows that it attains approximately 85% accuracy, which is comparable to the baseline approaches and execution time.
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
Acoustic Scene Classification ASC is classified audio signals to imply about the context of the recorded environment. Audio scene includes a mixture of background sound and a variety of sound events. In this paper, we present the combination of maximal overlap wavelet packet transform MODWPT level 5 and six sets of time domain and frequency domain features are energy entropy, short time energy, spectral roll off, spectral centroid, spectral flux and zero crossing rate over statistic values average and standard deviation. We used DCASE Challenge 2016 dataset to show the properties of machine learning classifiers. There are several classifiers to address the ASC task. We compare the properties of different classifiers K nearest neighbors KNN , Support Vector Machine SVM , and Ensembles Bagged Trees by using combining wavelet and spectral features. The best of classification methodology and feature extraction are essential for ASC task. In this system, we extract at level 5, MODWPT energy 32, relative energy 32 and statistic values 6 from the audio signal and then extracted feature is applied in different classifiers. Mie Mie Oo | Lwin Lwin Oo "Acoustic Scene Classification by using Combination of MODWPT and Spectral Features" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27992.pdfPaper URL: https://www.ijtsrd.com/computer-science/multimedia/27992/acoustic-scene-classification-by-using-combination-of-modwpt-and-spectral-features/mie-mie-oo
Investigation of the performance of multi-input multi-output detectors based...IJECEIAES
The next generation of wireless cellular communication networks must be energy efficient, extremely reliable, and have low latency, leading to the necessity of using algorithms based on deep neural networks (DNN) which have better bit error rate (BER) or symbol error rate (SER) performance than traditional complex multi-antenna or multi-input multi-output (MIMO) detectors. This paper examines deep neural networks and deep iterative detectors such as OAMP-Net based on information theory criteria such as maximum correntropy criterion (MCC) for the implementation of MIMO detectors in non-Gaussian environments, and the results illustrate that the proposed method has better BER or SER performance.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
Enhancing speaker verification accuracy with deep ensemble learning and inclu...IJECEIAES
Effective speaker identification is essential for achieving robust speaker recognition in real-world applications such as mobile devices, security, and entertainment while ensuring high accuracy. However, deep learning models trained on large datasets with diverse demographic and environmental factors may lead to increased misclassification and longer processing times. This study proposes incorporating ethnicity and gender information as critical parameters in a deep learning model to enhance accuracy. Two convolutional neural network (CNN) models classify gender and ethnicity, followed by a Siamese deep learning model trained with critical parameters and additional features for speaker verification. The proposed model was tested using the VoxCeleb 2 database, which includes over one million utterances from 6,112 celebrities. In an evaluation after 500 epochs, equal error rate (EER) and minimum decision cost function (minDCF) showed notable results, scoring 1.68 and 0.10, respectively. The proposed model outperforms existing deep learning models, demonstrating improved performance in terms of reduced misclassification errors and faster processing times.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
2. Int J Elec & Comp Eng ISSN: 2088-8708
Bayesian distance metric learning and its application in automatic speaker recognition ... (Satyanand Singh)
2961
efficient. Latest studies on mind imaging [5, 6] have uncovered numerous subtle elements on how to perform
psychological based speaker recognition, which may motivate new headings for ASR approach.
To begin with, considering the general exploration area, it would be helpful to illuminate what is
incorporated by the term speaker recognition, which comprises of two option undertakings: speaker
identification and verification. In speaker recognition, the assignment is to distinguish an unknown speaker
from the database of known speakers. There are two kinds of ASR system open-set and close-set. If all
speakers are known in a set then it is called as closed-set on the other hand, if the test speaker could likewise,
be from outside the predefined known speaker set, this turns into a the open-set situation, and, along these
lines, a world model or universal background model (UBM) [7] is required.
Speaker identification can be based on a voice stream that is content dependent or content free.
This is more significant in speaker-verification systems in which a claimed person speaks a predefined text,
like a password or personal identification number (PIN), to access system. All through this paper,
the emphasis will be on text-independent automatic speaker recognition systems.
A few algorithmic and its computational advances have empowered noteworthy ASR performance
in the cutting edge. Approaches utilizing phonotactic data, phoneme recognizer followed by language models
Phone Recognition and Language Modeling (PRLM) parallel PRLM, have been appeared to be very
effective [8]. In this phonotactic demonstrating structure, an arrangement of tokenizes is utilized to interpret
the speech information into token strings or cross sections which are later scored by n-gram dialect
models [9] or mapped into a sack of trigrams highlight vector for support vector machine (SVM).
In fact traditional Hidden Markov model (HMM) based speaker verification is generally based on
the tokenizer class model, all tokenizations connected here to make a system [10], for example, Gaussian
Mixture Model (GMM) tokenization [11], universal phone recognition (UPR) [12], articulator the property
based methodology [13], deep neural network based telephone recognizer [14], just to give some examples.
With the presentation of shifted delta-cepstral (SDC) acoustic components [15], promising results
utilizing the GMM system with the variable investigation [16, 17], supervector model [18]. Furthermore,
maximum mutual information (MMI) based discriminative training [19] have likewise been accounted for
LID. In this work, I concentrate on the acoustic level frameworks.
Acoustic-phonetic methodology, which is normally taken by specialists prepared on this, requires
quantitative acoustic estimations from speech signal tests, and based on fact examination of the result. By and
large, comparable phonetic units are removed from the known and addressed speech signal, and different
acoustic parameters measured from these portions are evaluated. The Logistic Regression (LR)can be
helpfully utilized as a part of this methodology since it depends on numerical parameters [20].
In spite of the fact that the acoustic-phonetic a methodology is a more target approach, it has some
subjective components. For instance, an acoustic–phonetician may distinguish speech signal as being
influenced by anxiety and after that perform the objective examination. In any, the case, whether the speaker
was really under anxiety at that minute is a subjective amount controlled by the inspector through his or her
experience. As on the date, aggregate variability i-vector ASR modeling has achieved critical consideration
in both LID and SV areas because of its remarkable efficiency, less system complexity and compact system
in size. In i-vector modeling, initial, a solitary element investigation is utilized as a front end to produce a low
dimensional aggregate variability space which together models dialect, speaker and channel variability all
together. At that point, inside this i-vector space, variability costs techniques, for example, Within-Class
Covariance Normalization(WCCN), Linear Discriminative examination (LDA) and Nuisance Attribute
Projection (NAP) [18], are performed to diminish the variability for consequent modeling (e.g., utilizing
SVM, LR, and neural network and probabilistic linear discriminate analysis (PLDA) for Language
Identification (LID).
In this paper, the conventional i-vectors are stretched out to label regularized regulated i-vectors by
connecting the marked vector and the straight relapse lattice toward the end of the mean supervector what's
more, the i-vector element stacking network, separately [21, 22]. I can let the added name vector be the
parameter vector that I need to perform and relapse with age paralinguistic measures to make the proposed
system reasonable for regression. The explanation behind utilizing a linear regression matrix W is that
numerous back end classification modules in LID and SV is linear. Additionally, if the regression connection
is not linear, I can use non-linear mapping as a preprocessing venture before creating the mark vectors.
The commitment weight of each supervector measurement and every objective class in the target capacity is
consequently ascertained by iterative preparing. The conventional i-vector framework serves as our baseline.
As a final point, motivated by the achievement of strong works for corrupted information based SV
jobs, I likewise considered Gammatone frequency Cepstral coefficients (GFCC) features and
the spectrotemporal Gabor features for powerful LID assignment on the corrupted information as extra
execution enhancer steps. At the point when combined with customary MFCC and SDC speaker specific
feature based frameworks, the general framework execution was further upgraded.
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2960 - 2967
2962
2. THE BASELINE i-VECTOR MODELING
Let us consider a model λ of C component GMM UBM with λc= {pc, μc, ∑c} where c = 1, . , , ..C
and the speech signal Y={y1, y2……..,yL } having L feature sequences. UBM based on BW statistics
computed as follows;
Nc = ∑ PL
t=1 (c|yt, λ) (1)
where GMM component is c = 1, … . , … , … C. and yt on λc is occupancy probability component P(c|yt, ).
Fc = ∑ PL
t=1 (c|yt, λ)(yt − μc) (2)
I can generate the corresponding central mean supervector F̌ by concatenating all together F̌c as follows;
F̌c =
Fc= ∑ PL
t=1 (c|yt,λ)(yt−μc)
∑ PL
t=1 (c|yt,λ)
(3)
F̌c can be projected on rectangular total variability matrix T (low rank factor loading) and i-vector x as
follows;
F̌c → Tx (4)
The total variability matrix T with C component GMM and acoustic features in D dimension can be
represented as CDXK matrix similar to eigenvoice matrix V. By considering F̌c and the i-vector can be
computed as follows;
x = {I + Tt
Σ−1
NT}−1
T−1
Σ−1
NF̌ (5)
where N is CD X CD dimension diagonal matrix. To reduce the variability I applied two channel
compensation methods in total variability space (i) LDA and (ii) WCCN. LDA minimizes intra-class
variance and WCCN normalizes the cosine kernel. Let us consider two i-vector 1 and 2 then cosine kernel is
defined as follows to adapt either PLDA or SVM classifier.
k(x1, x2) =
〈x1,x2〉
‖x1‖1‖x2‖2
(6)
3. IMPLEMENTATION OF MODIFIED AND SUPERVISED I-VECTOR
Using a combination of SVM classifiers and GMM supervectors have been an extremely fruitful
methodology for ASR. Observing the way that the channel considers to contain speaker-dependant data,
the speaker and channel elements were consolidated into a solitary space termed the total variability space.
3.1. Label-regularized supervised i-vector
Let us assume that the hidden variable i-vector is generation mean supervector. The steps involved
in label-regularization as follows:
Step I. Compute multivariate Gaussian distribution for jth
utterances of
P(xj) = 𝒩(0, I), P(F̌j|xj) = 𝒩(Txj, Nj
−1
Σ). Where xj = i-vector of jth
utterances.
Step II. Computation of posterior distribution of hidden variabl i-vector
P(xj|F̌j) = 𝒩 {(I + Tt
Σ−1
NjT)
−1
Tt
Σ−1
NjF̌j, (I + Tt
Σ−1
NjT)
−1
} where Nj = N vector.
Step III. Computation of discriminative i-vector, P(xj) = 𝒩(0, I).
Step IV. Regularization of label information,
P [(
F̌j
Lj
) |xj] = 𝒩 [(
Txj
Wxj
)] , [(
Nj
−1
Σ1
nj
−1 Σ2
)]
where F̌j= Mean super vector, Lj= Label vector, Σ1= CD dimension mean super vector, Σ2= M dimension
label vector.
4. Int J Elec & Comp Eng ISSN: 2088-8708
Bayesian distance metric learning and its application in automatic speaker recognition ... (Satyanand Singh)
2963
Step V. Designing of two supervised label vectors
Type 1: Li,j = {
1 for class i
0 otherwise
,
The class label will be correctly classified by regression matrix W. M is denoted as
the dimensionality of label vector Lj. Total number of speakers in database to recognize =H then Lj
is H(H = M) dimension binary vector. H − 1 numbers of elements will have “0” and one element will have
value "1".
Type 2 : Lj = X̅sj, W = I. The last iteration X̅sj specify the sample mean vector and compel
regression matrix to be identity matrix.
Step VI. Computation of the likelihood of the total training utterances of ASR system is:
∑ ln{P(F̌j, Lj, xj)} = ∑ [ln {P ((
F̌j
Lj
) |xj)} + ln{P(xj)}]Γ
j=1
Γ
j=1
Step VII. Computation of objective function Jm for Maximum Likelihood (ML):
Jm = ∑ [
1
2
xj
t
xj +
1
2
(F̌j − Txj)
t
∑ Nj(F̌j − Txj) +
1
2
(Lj − Wxj)
t
∑ Nj(F̌j − Txj) −
1
2
ln(|Σ1
−t
|) −−t
2
−t
1
Γ
j=1
1
2
ln(|Σ2
−t
|)]
After simplifying objective function equation.
Jm = ∑ [
1
2
xj
t
xj +
1
2
(A)t ∑ Nj(A) +
1
2
(B)t ∑ Nj(B) −
1
2
ln(|Σ1
−t|) −
1
2
ln(|Σ2
−t|)−t
2
−t
1 ]Γ
j=1
where A=F̌j − Txj and B=Lj − Wxj
3.2. Modified and simplified i-vector
The cosine distance score is fast and robust, but additional computation are required to standardize
the score. A better generation model will fully simulate the speech data and generate scores without
standardization or calibration.
Feature extraction and training of ASR system based on i-vector is computationally very expansive.
Let us consider the GMM size C, feature dimension as D and factor loading size as K. A single i-vector
generation and its computation cost is O[(K)3
+ (K)2
. C + (K. C. D)]. The main objective is to redefine and
reweight each and every speech data with mean super vector so that imbalance in intra-super vector can be
compensated. The steps involved in simplification of i-vector as follows:
Step I - i-vector with approximated computational cost O[(K)3
+ (K)2
. C + (K. C. D)].
Step II - supervised i-vector with approximated computational cost O[(K)3
+ (K)2
. C + {K. (C. D + M)}].
Step III - modified i-vector without ID with approximated computational cost O[(K)3
+ (K. C. D)].
Step IV - modified i-vector with ID with approximated computational cost O[(K. C. D)].
Step V - modified and supervised i-vector without ID with approximated computational cost O[(K)3
+
{K. (C. D + M)}].
Step VI - modified and supervised i-vector with ID with approximated computational cost 𝑂[{𝐾. (𝐶. 𝐷 +
𝑀)}].
For ASR application, I use a simple cosine distance classifier on simplified and modified i-vector
of the target speker utterances 𝑤𝑡𝑎𝑟𝑔𝑒𝑡 and test atturances 𝑤𝑡𝑒𝑠𝑡 with decision threshold 𝜃 as follows.
𝑠𝑐𝑜𝑟𝑒(𝑤𝑡𝑎𝑟𝑔𝑒𝑡, 𝑤𝑡𝑒𝑠𝑡) =
(𝑤 𝑡𝑎𝑟𝑔𝑒𝑡
𝑡
)𝑤 𝑡𝑒𝑠𝑡
‖𝑤 𝑡𝑎𝑟𝑔𝑒𝑡‖.‖𝑤 𝑡𝑒𝑠𝑡‖
𝜃<
≥
(7)
3.3. Linear discrimamt Analysis for session compensation
Treating a single speaker as a class LDA attempts to define a new axis to minimize intra-class
variance caused by session/channel effects and to maximize the differences between classes.
In the representation of total variability, there is no clear compensation for variability between intersessions.
However, the low dimensional representation allows making technical compensation in the new place,
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2960 - 2967
2964
with the benefit of less computational cost. I use a linear discrimination analysis (LDA) for session
compensation. Displaying speakers as the class, LDA attempts to define a new axis to reduce inter-class
variation due to session/channel effects and to maximize differences between classes.
I can define the problem of optimizing the LDA to find directions of q that maximize fisher criteria
J(q) = qt
Sbq qt
Swq⁄ . Between-class and within-class covariance matricws are represented as
Sb = ∑ (ws̅̅̅̅ − w̅)(ws̅̅̅̅ − w̅)tS
s=1 and Sw = ∑
1
ns
∑ ∑ (ws̅̅̅̅ − w̅)(ws̅̅̅̅ − w̅)tS
s=1
ns
i=1
S
s=1 . Where ns is the number of
utterances of each speaker s, ws̅̅̅̅ = (
1
ns
) ∑ wi
sns
i=1 is the mean of i-vector foe each speaker, speaker population
is w̅ and total number of speakers is S. Projection matrix is composed by eigenvectors of Sw
−1
Sb general
matrix to get the optimization.
3.4. Distance matric learning technique
Using modified and supervised as a low-dimensional representation of the linguistic expression,
cosine distance classifiers is used to measure the distance between classified cosine and the target user's
accent and accentuation of the test. To define the distance between the vectors aiming to find a good distance
metric in the feature space, there is an important issue in classification. In recent years, extensive research has
been conducted on distance learning [23]. I explored two ways to distance learning metrics that are
supervised in this paper. From now on, I will use modified and supervised to represent an utterance.
3.5. Component analysis based on neighborhood method
Neighborhood Component Analysis (NCA) near the random selection rule learns distance metrics to
reduce the average classification error. Using transformation matrix B, each simplified and supervised
modified and supervised 𝑤𝑖 chooses another simplified and supervised modified and supervised as its
neighbor with some probability 𝑝𝑖,𝑗, which is defined at Euclidean distance in the transformed space:
pi,j =
exp(−‖Bwi−Bwj‖
2
)
∑ ‖Bwi−Bwk‖2
k≠i
, pi,i = 0 (8)
The probability of simplified and supervised modified and supervised 𝑤𝑖 select a neighbor with
the same speaker is 𝑝𝑖 = ∑ 𝑝𝑖,𝑗𝑗∈𝑐 𝑖
, where 𝐶𝑖 is a set of the same speaker modified and superviseds.
The projection matrix B should maximize the number of expected simplified and supervised modified and
superviseds that select neighbors from the same speaker:
𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐵 𝑓(𝐵) = ∑ ∑ 𝑝𝑖,𝑗 = ∑ 𝑝𝑖𝑖𝑗∈𝑐 𝑖𝑖 (9)
3.6. Bayesian distance metric learning approach
Here I used conjugate gradient method to obtain the best seed. The NCA provides the one time
estimate of the distance metric and can be unbelievable when the number of training data is less.
I use the Bayesian structure to estimate the distribution after the distance metric [8]. Looking at the speaker
tag for each utterance, I can create two sets of barriers for the same speaker S and D. The possibility of
defining two utterances and related to the same speaker or different speakers under a given matrix:
𝑝𝑟(𝑦𝑖,𝑗|𝑤𝑖, 𝐴, 𝜇) =
1
1+𝑒𝑥𝑝(𝑦 𝑖,𝑗‖𝑤 𝑖−𝑤 𝑗‖
𝐴
2
−𝜇)
(10)
This parameter μ is the threshold for separating expressions for the same speaker parameters with
different speakers. Only when value of μ less than distance from matrix A, two expressions are likely to be
identified by the same speaker. The yi,j is defined as follows [24, 25]:
𝑦𝑖,𝑗 = {
+1 (𝑤𝑖, 𝑤𝑗)𝜖𝑆
−1 (𝑤𝑖, 𝑤𝑗)𝜖𝐷
(11)
I use the NCA as a preprocessing technique so that the vectors can be projected in such a place
where the nearest neighbors of each simplified and supervised modified and supervised can share the same
tag of high probability. Bayesian distance learning methods can emulate the distance between the carriers
better and more dependable in a new place. Experimental results show the advantages of the Bayesian
distance metric learning approach.
6. Int J Elec & Comp Eng ISSN: 2088-8708
Bayesian distance metric learning and its application in automatic speaker recognition ... (Satyanand Singh)
2965
4. EXPERIMENTAL RESULTS
Experiments are performed on the 13 different speaker detection tests that are defined by
the duration and type of training and test data on the NIST 2008 SRE dataset. I present results on the short2-
short3 and 10sec-10sec conditions for the training and test voice data of conversational of five minutes
duration. In this research it is also used for LDA and NCA training, and as the impostor set in the score
normalization step. A 600- dimension modified and supervised is extracted from each utterance. The Equal
Error Rate (EER) and the minimum Detection Cost Function (minDCF) are used as metrics for evaluation.
Cosine similarity scoring and Bayesian distance metric learning on the short2-short3 condition of the NIST
2008 SRE dataset. The Bayesian distance metric learning algorithm is referred to as Bayes dml, cosine score
after the combined score normalization for Cosine Score combined norm and PLDA with Gaussian GPLDA.
Constraints from all possible modified and supervised pairs from the same speaker S, apply the cosine score
to all possible modified and supervised pairs from different speakers and selects those with the highest score
to form constraint D since these pairs are the largest distinction that distinguishes the metric distance.
Since the number of all possible different speaker pairs is very large, I chose twice the number of similar
speaker pairs from all possible speaker pairs to form a D. The experimental experiment showed a large set of
offset speaker constraints (similar speaker pairs) does not improve performance but requires more
calculations, and a smaller set of speaker constraints (same size) with different speaker constraints will
degrade the ASR performance. The comparison of cosine scores, Bayesian_dml of NIST 2008 SRE GPLDA
normalization is shown in Table 1.
Table 1. Comparison of cosine scores, Bayesian_dml of NIST 2008 SRE GPLDA normalization
Combination Norms EER minDCF
LDA 200
Cosine Score 2.541% 0.01445
Cosine Score combined norm 1.790% 0.0097
Bayes dml 2.158% 0.0106
Bayes dml+znorm 2.159% 0.0107
Bayes dml+tnorm 2.158% 0.0107
GPLDA 3.12% 0.0156
As one can see in Table 1, the cosine score combined with the LDA200 standard can achieve the
best results, and GPLDA performs the worst. However, Bayesian dml performs better than the cosine score if
the score is not normalized. Compared to the state-of-the-art ASR performance of the integrated cosine score
standard, the gap with Bayesian_dml is very small. In addition, there is almost no advantage in the
normalization of scores in Bayesian_dml.
By understanding the differences between the Cosine Score combined criteria and Bayesian_dml,
I compare their performance to the different combinations of technical preprocessing. Pretreatment
techniques include LDA and NCA, which are applied before the scoring model. The comparison of cosine
scores, Bayesian_dml of NIST 2008 SRE different normalization is shown in Table 2.
Table 2. Comparison of cosine scores, Bayesian_dml of NIST 2008 SRE different normalization
Combination Norms EER minDCF
LDA 200
1.790% 0.0097
NCA150+LDA150 43.345% 0.0987
NCA200 2.345% 0.0131
NCA200+LDA100 2.017% 0.0097
NCA200+LDA200 1.767% 0.0095
NCA200+LDA600 41.078% 0.0197
NCA200 4.567% 0.0278
Bayes_dml
LDA 200 2.176% 0.0107
LDA200+NCA150+LDA150 42.345% 0.1005
LDA200+NCA200 3.034% 0.179
LDA200+NCA200+LDA100 1.775% 0.0097
LDA200+NCA200+LDA200 1.817% 0.1100
LDA600+NCA200+LDA200 43.786% 0.1001
NCA200+LDA200 3.564% 0.0178
Table 2 gives us an idea of how NCA and LDA represent hidden structures in the total variation
space. The worst performance appeared in the second and sixth lines. In both cases, the size of the NCA is
different from the size of the previous LDA. That is to say, the NCA acts to reduce the size and has a serious
impact on the results. In the fourth row, the NCA 200 only performs one rotation following the LDA 200,
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2960 - 2967
2966
and the LDA 100 later further reduces the size of the feature space, which has little effect on performance.
With the exception of the second and fourth rows, the results in the seventh row are almost at the same level
as the other rows, although the size of the NCA 200 is reduced on the 600-dimensional modified and
supervised. The reason may be that this size reduction is done in the original total variability space,
while the second and fourth line quota reductions are made in the reduced function space after the LDA.
Improvements in the fourth row indicate that the LDA projection corrects the functional space indication
learned from the NCA. Therefore, I can conclude that NCA did not play a role in reducing the dimension.
The best performance of the combined cosine score is obtained using LDA200 + NCA200 +
LDA200, and the best performance of Bayes_dml was obtained using LDA200 + NCA200 + LDA100.
Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3
condition report for NIST SRE 2008 data.
With a specific end goal to check the performance of LDA200 + NCA200 + LDA200 based ASR,
processed the real match scores with LDA200 + NCA200 + LDA100 with the impostor match scores.
The Detection Error Trade-off (DET) of LDA200 + NCA200 + LDA200 performance with NIST 2008 SRE
dataset an EER value of about 1.767% and with LDA200 + NCA200 + LDA100 the EER of about 1.776% is
shown in Figure 1.
Figure 1. Equal error rate of LDA200 + NCA200 + LDA200 and Bayes_dml
using LDA200 + NCA200 + LDA100 based ASR system
5. CONCLUSION
This paper presents the ASR application with Bayesian Distance Learning Metric technique. The
traditional i-vector has been modified by catenating the label vector, linear regression matrix, mean super
vector and modified and supervised factor loading matrix. The modified and supervised i-vetor in proposed
ASR has a very high degree of discriminatively in order to regularize the speaker specific label information
to enhance the performance of ASR.
The proposed Bayesian Distance Learning Metric technique in ASR can be utilized as an integrated
to enhance the efficiency and outperform the traditional i-vector. The proposed ASR system has achieved the
best performance of its EER as 1.767% using LDA200 + NCA200 + LDA200 with combined cosine score
normalization and best performance of Bayes_dml was obtained using LDA200 + NCA200 + LDA100 of its
EER at 1.776%.
REFERENCES
[1] S. Singh, “Forensic and Automatic Speaker Recognition System,” International Journal of Electrical and
Computer Engineering, vol/issue: 8(5), pp. 2804-2811, 2018.
[2] S. Singh. “High Level Speaker Specific Features as an Efficiency Enhancing Parameters in Speaker Recognition
System,” International Journal of Electrical and Computer Engineering, vol/issue: 9(4), 2019.
[3] S. Singh, “The Role of Speech Technology in Biometrics, Forensics and Man-Machine Interface,” International
Journal of Electrical and Computer Engineering, vol/issue: 9(1), pp. 281-288, 2019.
0.1 0.2 0.5 1 2 5 10 20 40
0.1
0.2
0.5
1
2
5
10
20
40
False Alarm probability (in %)
Missprobability(in%)
LDA200 + NCA200 + LDA100
LDA200 + NCA200 + LDA200
8. Int J Elec & Comp Eng ISSN: 2088-8708
Bayesian distance metric learning and its application in automatic speaker recognition ... (Satyanand Singh)
2967
[4] S. Singh, et al., “Short Duration Voice Data Speaker Recognition System Using Novel Fuzzy Vector Quantization
Algorithms,” 2016 IEEE International Instrumentation and Measurement Technology Conference, Taipei, Taiwan
pp. 1-6, 2016.
[5] P. Belin, et al., “Voice-selective areas in human auditory cortex,” Nature, vol. 403, pp. 309-312, 2000.
[6] S. Singh, “Evaluation of Sparsification algorithm and Its Application in Speaker Recognition System,”
International Journal of Applied Engineering Research, vol/issue: 13(17), pp. 13015-13021, 2018.
[7] E. Formisano, et al., “Who’ is saying ‘what’? Brainbased decoding of human voice and speech,” Science, vol. 322,
pp. 970-973, 2008.
[8] D. A. Reynolds, et al., “Speaker verification usingvadapted Gaussian mixture models,” Digital Signal Process,
vol/issue: 10(1-3), pp. 19-41, 2000.
[9] M. Zissman, “Language identification using phoneme recognition and phonotactic language modeling,”
Proc. ICASSP, pp. 3503-3506, 1995.
[10] S. Singh, “Support Vector Machine Based Approaches For Real Time Automatic Speaker Recognition System,”
International Journal of Applied Engineering Research, vol/issue: 13(10), pp. 8561-8567, 2018.
[11] S. Singh, et al., “Speaker Specific Phone Sequence and Support Vector Machines Telephonic Based Speaker
Recognition System,” International Journal of Applied Engineering Research, vol/issue: 12(19), pp. 8026-8033,
2017.
[12] H. Li, et al., “Spoken language recognition: From fundamentals to practice,” Proceedings of the IEEE 101,
pp. 1136-1159, 2013.
[13] P. T. Carrasquillo, et al., “Approaches to language identification using gaussian mixture models and shifted delta
cepstral features,” Proc. ICSLP, pp. 89-92, 2002.
[14] S. M. Siniscalchi, et al., “Exploiting context-dependency and acoustic resolution of universal speech attribute
models in spoken language recognition,” Proc. INTERSPEECH, pp. 2718-2721, 2010.
[15] G. Hinton, et al., “Deep neural networks for acoustic modeling in speech recognition,” The shared views of four
research groups. IEEE Signal Processing Magazine, vol. 29, pp. 82-97, 2012.
[16] L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Transactions on
Audio, Speech, and Language Processing, vol. 21, pp. 1060-1089, 2013.
[17] S. Singh, “Speaker Recognition by Gaussian Filter Based Feature Extraction and Proposed Fuzzy Vector
Quantization Modeling Technique,” International Journal of Applied Engineering Research, vol/issue: 13(16),
pp. 12798-12804, 2018.
[18] P. Kenny, et al., “A study of interspeaker variability in speaker verification,” IEEE Transactions on Audio, Speech,
and Language Processing, vol. 16, pp. 980-988, 2008.
[19] W. Campbell, et al., “Support vector machines using gmm supervectors for speaker verification,” IEEE Signal
Processing Letters, vol. 13, pp. 308-311, 2006.
[20] L. Burget, et al., “Discriminative training techniques for acoustic language identification,” Proc. ICASSP, 2006.
[21] S. Singh, et al., “Short Duration Voice Data Speaker Recognition System Using Novel Fuzzy Vector Quantization
Algorithms,” IEEE International Instrumentation and Measurement Technology Conference, 2016.
[22] S. Singh, et al., “Efficient Modelling Technique based Speaker Recognition under Limited Speech Data,”
International Journal of Image, Graphics and Signal Processing (IJIGSP), vol/issue: 8(11), pp. 41-48, 2016.
[23] D. Martinez, et al., “Language recognition in ivectors space,” Proc. INTERSPEECH, pp. 861-864, 2011.
[24] S. Singh, et al., “A Novel Algorithm of Sparse Representations for Speech Compression/Enhancement and Its
Application in Speaker Recognition System,” International Journal of Computational and Applied Mathematics,
vol/issue: 11(1), pp. 89-104, 2016.
[25] S. Singh and A. Singh, “Accuracy Comparison using Different Modeling Techniques under Limited Speech Data of
Speaker Recognition Systems,” Global Journal of Science Frontier Research: F Mathematics and Decision
Sciences, vol/issue: 16(2), pp. 1-17, 2016.