In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.
This document summarizes research on using dynamic frequency hopping (DFH) to increase spectral efficiency in cellular systems. It presents:
1) Analytical results showing DFH can significantly improve frequency outage probabilities compared to random frequency hopping.
2) Simulation results comparing the performance of DFH techniques like full replacement, worst dwell, and threshold-based approaches to random frequency hopping. Performance is measured by error rate distributions.
3) Findings that DFH techniques can support substantially more users than random frequency hopping by tracking interference conditions and adapting frequency hop patterns.
This document compares the performance of three mobile ad hoc network (MANET) routing protocols: AODV, FSR, and IERP. It uses the QualNet network simulator to evaluate these protocols based on various metrics like throughput, average jitter, average end-to-end delay, and packet delivery ratio. The protocols are evaluated under different node speeds on a grid topology network with 90 nodes over an area of 1500x1500 meters. Simulation results show that AODV generally performs best in terms of throughput and packet delivery ratio across varying node speeds, while FSR performs worst for these metrics. IERP shows the worst performance for average end-to-end delay and average jitter as node speed increases.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
This document presents a method to estimate the weights of main materials (copper, iron, oil) for transformers using an artificial neural network (ANN) with the Levenberg-Marquardt backpropagation algorithm. Training data consisting of 24 input/output pairs obtained from a transformer manufacturing company are used to train the ANN, with inputs like short circuit impedance, installation height, and temperature and outputs of material weights. The trained ANN is then able to accurately estimate the material weights of transformers, providing a method to forecast transformer costs for manufacturers.
This document proposes two genetic algorithm based methodologies called Minimum Spanning Tree First (MSTF) and Shortest Paths First (SPF) for designing customized and energy optimized irregular Network-on-Chip (NoC) topologies tailored to an application's communication characteristics. The MSTF methodology first constructs a minimum spanning tree and then extends the topology by adding shortest energy paths, while SPF first finds the shortest energy paths and then constructs the minimum spanning tree. Experimental results on random benchmarks show the SPF methodology reduces average dynamic communication energy by 18.5% on average compared to MSTF. SPF also achieves lower latency and similar throughput. Comparisons with regular 2D mesh NoCs and an intelligent mapping
A SURVEY ON OPTIMIZATION BASED SPECTRUM SENSING TECHNIQUES TO REDUCE ISI AND ...IJNSA Journal
Cognitive radio is emerging technologies in OFDM based wireless systems which are very important for spectrum sensing. By using cognitive radio (CR) high data can be transferred with low bit error rate. The key idea of OFDM is to split the total transmission bandwidth into the subcarriers which further reduce the intersymbol interference (ISI) and peak to average power ratio(PAPR) in the signal. There are many optimization based spectrum sensing techniques are existing for efficient sensing purpose but each has its own advantages and disadvantages. This leads to start the comprehensive study for reducing PAPR and ISI(Intersymbol interference) in terms of FPGA based partial configuration. In the first part of review OFDM characteristics of the signal has compared with several optimizations based ISI reduction techniques. The second part is to compare the various spectrum sensing techniques in cognitive radio engine and its application in FPGA.
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document summarizes techniques to reduce peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It first introduces OFDM and discusses how high PAPR is a drawback. It then describes several categories of PAPR reduction techniques: signal distortion techniques like clipping and filtering or peak windowing; signal scrambling techniques like selected mapping, partial transmit sequence, interleaving, tone reservation, and tone injection; and coding techniques like block coding. The document presents simulation results comparing 16-QAM and 64-QAM modulation schemes under various SNR and PAPR levels. It concludes that no single technique achieves large PAPR reduction with high efficiency and low complexity.
This document summarizes research on using dynamic frequency hopping (DFH) to increase spectral efficiency in cellular systems. It presents:
1) Analytical results showing DFH can significantly improve frequency outage probabilities compared to random frequency hopping.
2) Simulation results comparing the performance of DFH techniques like full replacement, worst dwell, and threshold-based approaches to random frequency hopping. Performance is measured by error rate distributions.
3) Findings that DFH techniques can support substantially more users than random frequency hopping by tracking interference conditions and adapting frequency hop patterns.
This document compares the performance of three mobile ad hoc network (MANET) routing protocols: AODV, FSR, and IERP. It uses the QualNet network simulator to evaluate these protocols based on various metrics like throughput, average jitter, average end-to-end delay, and packet delivery ratio. The protocols are evaluated under different node speeds on a grid topology network with 90 nodes over an area of 1500x1500 meters. Simulation results show that AODV generally performs best in terms of throughput and packet delivery ratio across varying node speeds, while FSR performs worst for these metrics. IERP shows the worst performance for average end-to-end delay and average jitter as node speed increases.
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
Automatic Speech Recognition (ASR) involves mainly
two steps; feature extraction and classification
(pattern recognition). Mel Frequency Cepstral Coeff
icient (MFCC) is used as one of the prominent featu
re
extraction techniques in ASR. Usually, the set of a
ll 12 MFCC coefficients is used as the feature vect
or in
the classification step. But the question is whethe
r the same or improved classification accuracy can
be
achieved by using a subset of 12 MFCC as feature ve
ctor. In this paper, Fisher’s ratio technique is us
ed for
selecting a subset of 12 MFCC coefficients that con
tribute more in discriminating a pattern. The selec
ted
coefficients are used in classification with Hidden
Markov Model (HMM) algorithm. The classification
accuracies that we get by using 12 coefficients and
by using the selected coefficients are compare
This document presents a method to estimate the weights of main materials (copper, iron, oil) for transformers using an artificial neural network (ANN) with the Levenberg-Marquardt backpropagation algorithm. Training data consisting of 24 input/output pairs obtained from a transformer manufacturing company are used to train the ANN, with inputs like short circuit impedance, installation height, and temperature and outputs of material weights. The trained ANN is then able to accurately estimate the material weights of transformers, providing a method to forecast transformer costs for manufacturers.
This document proposes two genetic algorithm based methodologies called Minimum Spanning Tree First (MSTF) and Shortest Paths First (SPF) for designing customized and energy optimized irregular Network-on-Chip (NoC) topologies tailored to an application's communication characteristics. The MSTF methodology first constructs a minimum spanning tree and then extends the topology by adding shortest energy paths, while SPF first finds the shortest energy paths and then constructs the minimum spanning tree. Experimental results on random benchmarks show the SPF methodology reduces average dynamic communication energy by 18.5% on average compared to MSTF. SPF also achieves lower latency and similar throughput. Comparisons with regular 2D mesh NoCs and an intelligent mapping
A SURVEY ON OPTIMIZATION BASED SPECTRUM SENSING TECHNIQUES TO REDUCE ISI AND ...IJNSA Journal
Cognitive radio is emerging technologies in OFDM based wireless systems which are very important for spectrum sensing. By using cognitive radio (CR) high data can be transferred with low bit error rate. The key idea of OFDM is to split the total transmission bandwidth into the subcarriers which further reduce the intersymbol interference (ISI) and peak to average power ratio(PAPR) in the signal. There are many optimization based spectrum sensing techniques are existing for efficient sensing purpose but each has its own advantages and disadvantages. This leads to start the comprehensive study for reducing PAPR and ISI(Intersymbol interference) in terms of FPGA based partial configuration. In the first part of review OFDM characteristics of the signal has compared with several optimizations based ISI reduction techniques. The second part is to compare the various spectrum sensing techniques in cognitive radio engine and its application in FPGA.
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document summarizes techniques to reduce peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It first introduces OFDM and discusses how high PAPR is a drawback. It then describes several categories of PAPR reduction techniques: signal distortion techniques like clipping and filtering or peak windowing; signal scrambling techniques like selected mapping, partial transmit sequence, interleaving, tone reservation, and tone injection; and coding techniques like block coding. The document presents simulation results comparing 16-QAM and 64-QAM modulation schemes under various SNR and PAPR levels. It concludes that no single technique achieves large PAPR reduction with high efficiency and low complexity.
The document discusses network survivability and protection techniques in optical networks. It covers protection in ring networks like UPSR and BLSR, protection in mesh networks using path and line protection, and multi-layer network resilience. The key aspects are the need for fast restoration at the optical layer to minimize outage times, different protection schemes, and the interaction between layers for resilient multi-layer networks.
Deep neural networks have recently achieved significant improvements in speech recognition accuracy compared to traditional hidden Markov models with Gaussian mixture models. The key advances have been new training methods for deep neural networks with many hidden layers. Researchers from several groups have found that training deep neural networks in two stages - first unsupervised pre-training of features followed by supervised discriminative fine-tuning - works well for acoustic modeling and speech recognition. On a variety of large vocabulary speech recognition tasks, deep neural networks trained this way have outperformed highly optimized Gaussian mixture model systems, sometimes by a large margin.
This document proposes using self-organizing maps (SOMs), an unsupervised artificial neural network technique, to categorize sensory data from wireless sensor networks (WSNs) in order to conserve battery power. The proposed system trains a 2x3 SOM on a base station node to categorize data from active sensor nodes. After training, the SOM defines categories that sensor nodes then transmit instead of raw data, reducing transmissions and saving up to 48.5% of battery power. Evaluation of the approach considers battery savings versus number of training samples and transmission interval.
This document describes the simulation of an OFDM system using MATLAB. It discusses how OFDM works and its advantages for wireless communications. The simulator allows analyzing the performance of BPSK, QPSK, and QAM modulation under Rayleigh fading and multipath channels. It measures bit error rate as a function of signal to noise ratio. The GUI interface facilitates changing parameters and visualizing the transmitted and received OFDM frames, spectra, and constellations. Simulation results demonstrate the variation in performance between modulation techniques in different channel conditions.
Mobility and Propagation Models in Multi-hop Cognitive Radio Networksszhb
This document discusses how mobility affects the performance of multi-hop cognitive radio networks under different propagation models. It summarizes the results of simulations run with the Network Simulator 2 (NS2) that tested free space, two-ray ground, and shadowing propagation models carrying MPEG4 video traffic over a cognitive radio network with mobile nodes. The key findings are: 1) Free space propagation provided the best throughput and lowest loss ratio compared to two-ray ground and shadowing models; 2) All propagation models performed significantly worse with mobile nodes compared to stationary nodes; 3) Mobility reduced average performance by disrupting signal strength and increasing noise as nodes moved.
Deep learning refers to machine learning techniques that use multiple layers of non-linear information processing to learn representations of data. It has emerged as an area of machine learning research that is impacting signal and information processing. Deep learning can be used for unsupervised learning with models like deep belief networks, supervised learning with deep neural networks, or hybrid approaches that combine unsupervised and supervised techniques.
Parallel WaveGAN, Yamamoto, Ryuichi, Eunwoo Song, and Jae-Min Kim. "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. review by June-Woo Kim
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...IOSR Journals
The document discusses a signal processing approach using wavelet transforms to identify identical reads from DNA sequencing data. It proposes representing DNA sequences numerically based on electron-ion interaction pseudo potentials and applying multi-level Haar wavelet transforms. This reduces sequence lengths to one-eighth their original size, allowing efficient element-by-element comparisons to find identical reads. Testing on Bacillus sequencing data showed the method identified up to 17% identical reads, reducing redundant data by up to 11% and computation time compared to string processing methods.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Performance Analysis of A Ds-Cdma System by using Rayleigh and Nakagami-M Fad...IRJET Journal
1) The document analyzes the performance of a DS-CDMA system using Rayleigh and Nakagami-M fading channels.
2) It studies expressions for signal-to-noise plus interference ratio with and without receive diversity under Nakagami-m fading.
3) The analysis is extended to a Rake receiver with maximal ratio combining technique and multiple receive antennas to determine the improvement in bit error rate and receiver sensitivity due to diversity.
An Efficient Resource Utilization Scheme for Video Transmission over Wireless...ijsrd.com
In this paper we propose an energy efficient video transmission strategy for wireless sensor networks, which combines wavelet-based image decomposition and cooperative communication. The proposed scheme uses the selective decode and forward (SDF) cooperation, so that a relay node collaborates with the source by forwarding only a lower-resolution version of the original video, obtained via discrete wavelet transform (DWT). We show that the proposed SDF-DWT strategy is more energy efficient than non-cooperative single-hop and multi-hop, also outperforming the regular SDF scheme. In addition, we show that our method can achieve the energy efficiency of incremental DF (IDF), without the need of a feedback channel.
IRJET-Spectrum Allocation Policies for Flex Grid Network with Data Rate Limit...IRJET Journal
This document discusses spectrum allocation policies for flex grid networks with data rate limited transmission. It begins with an abstract that outlines the tradeoff between data rate, allocated frequency slots, and modulation format that must be considered for spectrum allocation. It then discusses the objectives of identifying optical paths, path lengths, selecting modulation schemes, and finding optimal routes. The methodology section covers factors considered for spectrum allocation like modulation formats, noise, crosstalk, and transmission distances limited by noise and crosstalk for different modulation formats and fiber core counts. Implementation details algorithms for network setup, finding shortest paths, candidate path selection, and spectrum allocation. Results show fragmentation increases with demand but is constant for some core fibers. Higher core counts provide advantages and lower request
1) The document proposes an Energy Efficient Traffic Routing mechanism based on Point Synchronization Fitness Functioning (EETR-PSFF) to reduce energy consumption in wireless sensor networks.
2) EETR-PSFF aims to improve energy efficiency and bandwidth utilization by enabling sensor nodes to sleep periodically and allocating traffic slots actively.
3) It further aims to reduce the average energy consumption during wake-up processes using a Point Synchronization Fitness Function that sums the coverage values and divides by a product rule relating to the number of source nodes.
This document lists several signal processing projects completed for IEEE in 2015. It includes projects on topics like target source separation using deep neural networks and NMF, image denoising using non-local means and a soft threshold, improving fingerprint image quality in the wavelet domain, brain MRI segmentation using discriminative clustering and feature selection, dynamic texture classification using hidden Markov models, an EEG-based biometric system using eigenvector centrality in brain networks, and cooperative mesh networks with decode-and-forward relays. The projects are implemented in languages like Matlab, C#, and NS2 and cover techniques such as NMF, deep neural networks, wavelets, hidden Markov models, and more.
This document proposes a video genre classification method using only audio features extracted from video clips. It uses Multivariate Adaptive Regression Splines (MARS) to build classification models for different genres based on low-level audio features like MFCCs, zero crossing rate, short-time energy, etc. extracted from a dataset of news, cartoons, sports, music and dahmas video clips. The models are able to accurately classify video genres with an overall classification rate of 91.83% based on the important audio features identified for each genre by the MARS models.
This document contains titles and descriptions of 12 networking projects. It includes projects on topics like efficient full-text retrieval in peer-to-peer networks, density estimation in wireless ad hoc networks, detection of flooding DDoS attacks, game-theoretic pricing for video streaming, throughput and energy efficiency in wireless networks, and more. Each project description provides details on the research problem addressed, proposed solution, and relevant references. The document serves as a reference for potential final year project titles in networking.
Soft Frequency Reuse (SFR) in LTE-A Heterogeneous Networks based upon Power R...IJECEIAES
As the traffic demand grows and the RF environment changes, the mobile network relies on techniques such as SFR in Heterogeneous Network (HetNet) to overcome capacity and link budget limitation to maintain user experience. Inter-Cell Interference (ICI) strongly affecting Signal-toInterference plus Noise Ratio (SINR) of active UEs, especially cell-edge users, which leads to a significant degradation in the total throughput. In this paper we evaluate the performance of SFR with HetNet system in order dealing with interferences. Simulation result shows that the power ratio control in SFR HetNet system doesn’t have much effect on total achieved capacity for overall cell.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
The document discusses network survivability and protection techniques in optical networks. It covers protection in ring networks like UPSR and BLSR, protection in mesh networks using path and line protection, and multi-layer network resilience. The key aspects are the need for fast restoration at the optical layer to minimize outage times, different protection schemes, and the interaction between layers for resilient multi-layer networks.
Deep neural networks have recently achieved significant improvements in speech recognition accuracy compared to traditional hidden Markov models with Gaussian mixture models. The key advances have been new training methods for deep neural networks with many hidden layers. Researchers from several groups have found that training deep neural networks in two stages - first unsupervised pre-training of features followed by supervised discriminative fine-tuning - works well for acoustic modeling and speech recognition. On a variety of large vocabulary speech recognition tasks, deep neural networks trained this way have outperformed highly optimized Gaussian mixture model systems, sometimes by a large margin.
This document proposes using self-organizing maps (SOMs), an unsupervised artificial neural network technique, to categorize sensory data from wireless sensor networks (WSNs) in order to conserve battery power. The proposed system trains a 2x3 SOM on a base station node to categorize data from active sensor nodes. After training, the SOM defines categories that sensor nodes then transmit instead of raw data, reducing transmissions and saving up to 48.5% of battery power. Evaluation of the approach considers battery savings versus number of training samples and transmission interval.
This document describes the simulation of an OFDM system using MATLAB. It discusses how OFDM works and its advantages for wireless communications. The simulator allows analyzing the performance of BPSK, QPSK, and QAM modulation under Rayleigh fading and multipath channels. It measures bit error rate as a function of signal to noise ratio. The GUI interface facilitates changing parameters and visualizing the transmitted and received OFDM frames, spectra, and constellations. Simulation results demonstrate the variation in performance between modulation techniques in different channel conditions.
Mobility and Propagation Models in Multi-hop Cognitive Radio Networksszhb
This document discusses how mobility affects the performance of multi-hop cognitive radio networks under different propagation models. It summarizes the results of simulations run with the Network Simulator 2 (NS2) that tested free space, two-ray ground, and shadowing propagation models carrying MPEG4 video traffic over a cognitive radio network with mobile nodes. The key findings are: 1) Free space propagation provided the best throughput and lowest loss ratio compared to two-ray ground and shadowing models; 2) All propagation models performed significantly worse with mobile nodes compared to stationary nodes; 3) Mobility reduced average performance by disrupting signal strength and increasing noise as nodes moved.
Deep learning refers to machine learning techniques that use multiple layers of non-linear information processing to learn representations of data. It has emerged as an area of machine learning research that is impacting signal and information processing. Deep learning can be used for unsupervised learning with models like deep belief networks, supervised learning with deep neural networks, or hybrid approaches that combine unsupervised and supervised techniques.
Parallel WaveGAN, Yamamoto, Ryuichi, Eunwoo Song, and Jae-Min Kim. "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. review by June-Woo Kim
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...IOSR Journals
The document discusses a signal processing approach using wavelet transforms to identify identical reads from DNA sequencing data. It proposes representing DNA sequences numerically based on electron-ion interaction pseudo potentials and applying multi-level Haar wavelet transforms. This reduces sequence lengths to one-eighth their original size, allowing efficient element-by-element comparisons to find identical reads. Testing on Bacillus sequencing data showed the method identified up to 17% identical reads, reducing redundant data by up to 11% and computation time compared to string processing methods.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Performance Analysis of A Ds-Cdma System by using Rayleigh and Nakagami-M Fad...IRJET Journal
1) The document analyzes the performance of a DS-CDMA system using Rayleigh and Nakagami-M fading channels.
2) It studies expressions for signal-to-noise plus interference ratio with and without receive diversity under Nakagami-m fading.
3) The analysis is extended to a Rake receiver with maximal ratio combining technique and multiple receive antennas to determine the improvement in bit error rate and receiver sensitivity due to diversity.
An Efficient Resource Utilization Scheme for Video Transmission over Wireless...ijsrd.com
In this paper we propose an energy efficient video transmission strategy for wireless sensor networks, which combines wavelet-based image decomposition and cooperative communication. The proposed scheme uses the selective decode and forward (SDF) cooperation, so that a relay node collaborates with the source by forwarding only a lower-resolution version of the original video, obtained via discrete wavelet transform (DWT). We show that the proposed SDF-DWT strategy is more energy efficient than non-cooperative single-hop and multi-hop, also outperforming the regular SDF scheme. In addition, we show that our method can achieve the energy efficiency of incremental DF (IDF), without the need of a feedback channel.
IRJET-Spectrum Allocation Policies for Flex Grid Network with Data Rate Limit...IRJET Journal
This document discusses spectrum allocation policies for flex grid networks with data rate limited transmission. It begins with an abstract that outlines the tradeoff between data rate, allocated frequency slots, and modulation format that must be considered for spectrum allocation. It then discusses the objectives of identifying optical paths, path lengths, selecting modulation schemes, and finding optimal routes. The methodology section covers factors considered for spectrum allocation like modulation formats, noise, crosstalk, and transmission distances limited by noise and crosstalk for different modulation formats and fiber core counts. Implementation details algorithms for network setup, finding shortest paths, candidate path selection, and spectrum allocation. Results show fragmentation increases with demand but is constant for some core fibers. Higher core counts provide advantages and lower request
1) The document proposes an Energy Efficient Traffic Routing mechanism based on Point Synchronization Fitness Functioning (EETR-PSFF) to reduce energy consumption in wireless sensor networks.
2) EETR-PSFF aims to improve energy efficiency and bandwidth utilization by enabling sensor nodes to sleep periodically and allocating traffic slots actively.
3) It further aims to reduce the average energy consumption during wake-up processes using a Point Synchronization Fitness Function that sums the coverage values and divides by a product rule relating to the number of source nodes.
This document lists several signal processing projects completed for IEEE in 2015. It includes projects on topics like target source separation using deep neural networks and NMF, image denoising using non-local means and a soft threshold, improving fingerprint image quality in the wavelet domain, brain MRI segmentation using discriminative clustering and feature selection, dynamic texture classification using hidden Markov models, an EEG-based biometric system using eigenvector centrality in brain networks, and cooperative mesh networks with decode-and-forward relays. The projects are implemented in languages like Matlab, C#, and NS2 and cover techniques such as NMF, deep neural networks, wavelets, hidden Markov models, and more.
This document proposes a video genre classification method using only audio features extracted from video clips. It uses Multivariate Adaptive Regression Splines (MARS) to build classification models for different genres based on low-level audio features like MFCCs, zero crossing rate, short-time energy, etc. extracted from a dataset of news, cartoons, sports, music and dahmas video clips. The models are able to accurately classify video genres with an overall classification rate of 91.83% based on the important audio features identified for each genre by the MARS models.
This document contains titles and descriptions of 12 networking projects. It includes projects on topics like efficient full-text retrieval in peer-to-peer networks, density estimation in wireless ad hoc networks, detection of flooding DDoS attacks, game-theoretic pricing for video streaming, throughput and energy efficiency in wireless networks, and more. Each project description provides details on the research problem addressed, proposed solution, and relevant references. The document serves as a reference for potential final year project titles in networking.
Soft Frequency Reuse (SFR) in LTE-A Heterogeneous Networks based upon Power R...IJECEIAES
As the traffic demand grows and the RF environment changes, the mobile network relies on techniques such as SFR in Heterogeneous Network (HetNet) to overcome capacity and link budget limitation to maintain user experience. Inter-Cell Interference (ICI) strongly affecting Signal-toInterference plus Noise Ratio (SINR) of active UEs, especially cell-edge users, which leads to a significant degradation in the total throughput. In this paper we evaluate the performance of SFR with HetNet system in order dealing with interferences. Simulation result shows that the power ratio control in SFR HetNet system doesn’t have much effect on total achieved capacity for overall cell.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
This document summarizes a study that compares different acoustic feature extraction methods (LPC, MFCC, PLP) for a Bangla speech recognition system using LSTM neural networks. It finds that PLP outperforms MFCC and LPC based on statistical distance measurements of phoneme coefficients. PLP shows better distinction between phonemes compared to MFCC and LPC. While RNN/LSTM are inherently slow, combining PLP with faster networks like Transformers may improve performance for large datasets.
Deep convolutional neural networks-based features for Indonesian large vocabu...IAESIJAI
This document describes a study that used convolutional neural networks (CNNs) to extract features for Indonesian large vocabulary speech recognition. The CNN model was trained discriminatively on speech data that had undergone speed perturbation, unlike typical deep learning models that are trained generatively. Evaluations showed the proposed CNN-DNN method achieved a 7.26% error reduction over DBN-DNN using MFCC features and a 9.01% error reduction over DBN-DNN using filterbank features on an Indonesian speech dataset. An additional 6.13% error reduction was achieved compared to a generatively trained CNN-DNN model. The study aims to address the challenge of limited data for non-mainstream languages like Indones
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
The document discusses various literature on OFDM and MC-CDMA techniques. It summarizes 17 research papers on topics like using wavelet packets instead of Fourier transform in MC-CDMA to improve bandwidth efficiency and reduce interference. It also discusses using techniques like DWT, Radon transform, and antenna diversity with MC-CDMA and comparing the BER performance of different approaches in various channel conditions like AWGN, Rayleigh fading and frequency selective fading channels. The rationale given is that replacing Fourier transform with wavelet packets in MC-CDMA can eliminate the need for cyclic prefix and improve spectral efficiency.
The document surveys 17 literature sources on multi-carrier modulation techniques like OFDM and MC-CDMA. Several sources propose using wavelet transforms instead of Fourier transforms to improve bandwidth efficiency and reduce interference for MC-CDMA systems. Simulation results from the literature show that wavelet packet based MC-CDMA can outperform FFT based MC-CDMA in terms of lower bit error rates, especially in frequency selective fading channels. The rationale given is that wavelet transforms eliminate the need for cyclic prefixes, thereby improving spectral efficiency over traditional MC-CDMA schemes.
Advantages And Disadvantages Of 5.1 Ofdm-IDMA SchemeJulie Kwhl
Multicarrier modulation divides the bandwidth into multiple narrowband subchannels, providing advantages like increased resistance to fading and intersymbol interference, but at the cost of higher peak-to-average power ratio and increased complexity compared to single carrier systems. It allows high data transmission over bandwidth-limited and frequency-selective channels but is more sensitive to carrier frequency offsets and phase noise. While multicarrier modulation provides several benefits, its main disadvantages are high peak-to-average power ratio and increased system complexity.
We applied various architectures of deep neural networks for sound event detection and compared their performance using two different datasets. Feed forward neural network (FNN), convolutional neural network (CNN), recurrent neural network (RNN) and convolutional recurrent neural network (CRNN) were implemented using hyper-parameters optimized for each architecture and dataset. The results show that the performance of deep neural networks varied significantly depending on the learning rate, which can be optimized by conducting a series of experiments on the validation data over predetermined ranges. Among the implemented architectures, the CRNN performed best under all testing conditions, followed by CNN. Although RNN was effective in tracking the time-correlation information in audio signals,it exhibited inferior performance compared to the CNN and the CRNN. Accordingly, it is necessary to develop more optimization strategies for implementing RNN in sound event detection.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
The document proposes a streaming punctuation technique that leverages bidirectional context for continuous speech recognition. It introduces a novel approach of streaming punctuation that discards decoder segmentation and shifts punctuation decision making to a powerful Transformer model. Experimental results show that streaming punctuation improves segmentation accuracy by 13.9% and achieves an average BLEU score gain of 0.66 for downstream machine translation tasks.
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEUscore improvement of 0.66 for the downstream task of Machine Translation (MT).
In a Mobile Ad hoc Network (MANET), due to mobility, limited battery power and poor features of nodes,
network partitioning and nodes disconnecting occur frequently. To improve data availability, database
systems create multiple copies of each data object and allocate them on different nodes. This paper
proposes Automated Re-allocator of Replicas Over MANET (ARROM), that addresses these issues.
ARROM reduces the average response time of requests between clients and database servers by
reallocating replicas frequently. In addition, ARROM increases the average throughput in the network. Our
performance study indicates that ARROM improves average response time and average network
throughput in MANET as compared to resent existing scheme.
Investigation of the performance of multi-input multi-output detectors based...IJECEIAES
The next generation of wireless cellular communication networks must be energy efficient, extremely reliable, and have low latency, leading to the necessity of using algorithms based on deep neural networks (DNN) which have better bit error rate (BER) or symbol error rate (SER) performance than traditional complex multi-antenna or multi-input multi-output (MIMO) detectors. This paper examines deep neural networks and deep iterative detectors such as OAMP-Net based on information theory criteria such as maximum correntropy criterion (MCC) for the implementation of MIMO detectors in non-Gaussian environments, and the results illustrate that the proposed method has better BER or SER performance.
This document reviews the OFDM-IDMA technique and its implementations. It begins with introductions to OFDM and OFDM-IDMA. OFDM-IDMA uses interleaving instead of spreading sequences to distinguish users, avoiding bandwidth expansion without coding gain. The document then summarizes various implementations of OFDM-IDMA using discrete wavelet transform, MIMO systems, and implementations on FPGA. It also discusses implementations using finite Radon transform and discrete wavelet transform. Finally, it proposes future work on implementing OFDM-IDMA using Radon transform and performing comparative analysis of wavelet, FFT, and Radon-based OFDM-IDMA systems over AWGN and Rayleigh fading channels.
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
This document reviews progress in acoustic modeling for statistical parametric speech synthesis, from early hidden Markov models to recent neural network approaches. It discusses how hidden Markov models were previously dominant but artificial neural networks are now replacing them due to improvements in naturalness. The document also examines developing accurate audio classifiers using machine learning techniques on public datasets and improving classification accuracy beyond current levels of 50-79% by employing strategies like cross-validation.
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Similar to Convolutional Neural Network and Feature Transformation for Distant Speech Recognition (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 5381 - 5388
5382
systems are still unsatisfactoryfor these conditions [7]. Noise and reverberation distort the speech signals
causing large degradation on the performance of ASR systems. This may hold back the users when using
ASR applications.
Many studies have proposed techniques to improve the accuracies of ASR in noisy and reverberant
conditions. One approach is to enhance the noisy features by applying noise removal techniques [8] . Others
designed a discriminative, handcrafted features that are more robust against noise and reverberation [9].
Many works also propose adapting the acoustic models into noisy condition [10]. In DNN frameworks
however, many methods proposed for HMM-GMM systems may not work as well [7]. For deep learning
frameworks, various architectures are investigated to find the better systems such as recurrent neural network
(RNN) [11] and convolutional neural network (CNN) [12]. In these approaches, the hidden layers of the
systems are increased to produce more discriminative features before fine-tuning in the last layers. However,
this may significantly increase the computational time for training.
Currently, CNN are gaining interests among researchers. Originally, it is used in computer vision
[13]. Some studies [14]-[17] indicate it to be better than DNN for large scale vocabulary tasks. We argues,
the properties of CNN such as pooling could benefit in reverberant conditions. In these studies mostly deal
with noise only. Their implementations on dealing reverberations have not yet explored.
One advantage of deep learning frameworks is the ability of the network to learn the discriminative
features given input data [18]. Studies show that transforming features before feeding them to the networks
may benefit the performance of deep learning systems. There are numerous approaches that can be
implemented in feature domain to improve the performance of ASR systems in deep architectures for large
vocabulary systems. Some of them are linear discriminant analysis (LDA) [19], heteroscedastic linear
discriminant analysis (HLDA) [20], Maximum Likelihood Linear Transform (MLLT) [21], feature based-
minimum phone error (fMPE) [22], or using the combined transformations.
In this study, we propose CNN with feature transformations for improving the robustness of ASR
against reverberation. we apply LDA and MLLT on features before feeding them to CNN. We argue that,
applying them may also improve the robustness of speech recognition in reverberant conditions by still using
relatively smaller number of hidden layer. We evaluate the use of feature transformations (i.e. LDA and
MLLT) on Mel-frequency cepstral coefficient (MFCC) We capture the context information of speech by
splicing the features with several preceding and succeeding frames and then applied LDA to reduce the
dimensionality. After that, we apply MLLT on the reduced features. In this we feed the transformed features
as acoustic input for CNN.
The rest of the paper is organized as follows. Section 2. provide theoretical background for our
system. In this section, we briefly describe the features we used, the feature transformations and CNN.
In Section 3., we explain our proposed system. In Section 4., we we explain our experimental setup to
evaluate our method and discuss the results. We conclude the paper in Section 5.
2. THEORETICAL BACKGROUND
2.1. Speech Features
Many features have been proposed for ASR. MFCC is arguably the most popular one. MFCC is a
handcrafted feature that is extracted using two-stages Fourier transform. The aim is applied to decorrelate
speech components in time and frequency domains. By doing so, speech units, such as phonemes, could be
modeled using mixtures of Gaussians using only their diagonal covariances. In MFCC extraction process, the
speech signals are chunked into sequences of frames with fixed duration, usually around 25-50 ms each.
Speech is assumed to be stationary for each frames and then the Fourier transform is applied to obtain its
spectral components. Usually, the power spectra are used by taking the square of its magnitude. Then, the
spectra are mapped into a mel-scaled filter-banks to emphasis the frequency in lower region more. After that,
the log operation is applied to the output of mel-filterbank before applying Fourier Transform, in this case
only using the real part of Fourier transform to decorrelate each component in frequency domain.
While MFCC shows good results when the conditions between training and testing are the same, it
suffers when there is high variability on the data. Speech is highly varied due to intra-speaker variabilities,
inter-speaker variabilities, environment variabilities, i.e. when speech is noisy or contaminated, etc. Many
studies propose different features to improve the robustness of ASR. PLP is one of the examples. The main
difference between PLP and MFCC is that PLP applies cube root function instead of log. The objective is to
reduce the sensitivity of the features in low energy region which is most sensitive to noise. Other difference
is the use of bark scale instead of mel in MFCC.
MFCC shows pretty good performance in HMM-GMM systems. Since it is quite uncorrelated, it is
adequate to model each state of HMM using mixtures of Gaussian only using the diagonal covariances of the
GMM. However, two-stages Fourier transform remove the correlation between speech components in time-
3. Int J Elec & Comp Eng ISSN: 2088-8708
Convolutional Neural Network and Feature Transformation for Distant… (Hilman F. Pardede)
5383
frequency domains. These correlations could still be needed in recognition process. This may be one of the
reason that ASR are not robust. In DNN-HMM systems, some studies show it is more benefit when more
“raw” features are used. One of them is FBANK [23]. FBANK has the same extraction process with MFCC
except it is without the second stage Fourier transform. So, some correlations in frequency are still exist.
2.2. Feature Transformation
Transforming features to other domain spaces often found effective in many classification tasks in
machine learning. The use of high dimension features are often ineffective because it may lead to overfitting.
Reducing the dimensions of the features is often applied. LDA and PCA are examples of feature reduction
techniques. LDA [24] is applied in supervised manner while PCA is an unsupervised technique.
LDA is usually applied in preprocessing stage to reduce the dimensions of feature from
the n-dimensional features are reduced into m-dimensional space (m<n).The objective is to project the
features space into lower dimensional space and making the features more discriminative. The lower
dimensional feature space is chosen such that it emphasizes the distances between classes more than within
the class. Mathematically it could be written:
T
e
T
i
J
det
det (1)
where Σ𝑖 are the covariance between class, Σ 𝑒 is the covariance within class, θ is the feature, and J(θ) is the
cost function that to be maximized. The solution for J(θ) is by taking the first m eigenvectors of matrix Σ 𝑒
−1
Σ𝑖 after sorting the eigenvalues from the largest ones. For more information on applying LDA on speech
features could refer to [19]
Meanwhile, MLLT [25] is applied in HMM-GMM systems to loosen the assumption in HMM-
GMM systems. In HMM-GMM systems, it is assumed that the features are independent with each other.
Therefore, Gaussian assumptions are with only diagonal co-variances are used. While this could fasten the
training time, the assumption may not necessarily holds. This is because speech component may still be
related to each other in feature space. MLLT [25], which is also known as semi-tied co-variance (STC) [26],
linearly transforms the sample data to a new transformed space that are Gaussian distributed to loosen this
assumption. MLLT is applied to implicitly capture the correlation between the feature elements by using
constrained co-variance model.
MLLT works as follow. MLLT uses eigen decomposition to decompose a full co-variance matrix
over the set of Gaussian components, and each components maintain its “diagonal” characteristics. A full
covariance matrix could be decomposed using the following formula:
Trm
diag
rm )()()()( ˆˆ HΣHΣ (2)
where m is the index of Gaussian component and r is the index of class. Each component m has three
parameters: weight, mean, and diagonal element of semi-tied co-variance matrix. So, each co-variance matrix
𝚺(𝒎)
could be decompose into two: diagonal element of co-variance matrix component m, 𝚺 𝒅𝒊𝒂𝒈
(𝒎)
, and a
shared full co-variance matrix of Gaussian components in class r, 𝐇(𝒓)
(named as semi-tied transform). We
denote 𝐀(𝒓)
be the inverse of 𝐇(𝒓)
. In ASR, the covariance matrices are trained under Maximum Likelihood
sense on the training data and will be optimized with respect to 𝐀̂(𝒓)
the mean of the Gaussians 𝜇(𝑚)
and
diagonal covariance matrices 𝚺̂
𝒅𝒊𝒂𝒈
(𝒎)
. So, the cost function J could be written:
,
)()()(
2
)(
)( ˆˆdiag
ˆ
logˆ,
r
Mm
Trmr
r
mJ
AWA
A (3)
where:
m
Tmm
mm
)()(
)( ˆˆ oo
W (4)
,)(r
Mm
m
(5)
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 5381 - 5388
5384
and
Tmm qp O, (6)
The notation 𝑞 𝑚(𝜏) is the gaussian component m at time τ, OT is the training data and o(τ) is the observed
feature. Then the maximum likelihood estimation of the mean is:
m
mm
o)(
ˆ (7)
and the covariance matrix estimate is:
Trnrm
diag
)()()()(
diagˆ AWAΣ (8)
Calculating 𝐀(𝒓)
is nontrivial. To estimate it, it is initialized using an identity matrix and then estimate 𝚺̂
𝒅𝒊𝒂𝒈
(𝒎)
using Equation (8) and then 𝐀(𝒓)
is updated using Eg. (2)
2.3. Convolutional Neural Network
A typical convolutional network structure is illustrated in Figure 1. This is different from DNN,
where all neurons in the previous layers are connected to all the neurons of the successive layers, which may
not be effective when the features have large dimensions. Convolutional Neural Network (CNN) is a special
kind of deep neural network. CNN introduces two types of special network layers, called convolutional layer
and pooling layer. Each neuron of the convolutional layer receives inputs from a set of filters of the lower
layer. The filters are obtained by multiplying a small local part of the input with the weight matrix, where
these filters are then replicated throughout the whole input space. Localized filters that share the same weight
appear as feature maps. After completing convolution process, a pooling layer takes inputs from a local part
of the convolutional layer and generates a lower resolution version of filter activation.
In the implementations for speech recognition, after few layers of CNN structured, a fully connected
layer of generative deep neural network model (DBN-DNN) is performed to combine extracted local patterns
from all positions in the lower layer for final recognition [27]. In this paper, we use two layers of CNN
structure and then apply 4 layers of DBN to produce a total of 6 layers of hidden layers for pretraining. Then
DNN is applied on the top of then for supervised learning.
Figure 1. A typical CNN architecture
5. Int J Elec & Comp Eng ISSN: 2088-8708
Convolutional Neural Network and Feature Transformation for Distant… (Hilman F. Pardede)
5385
3. THE PROPOSED SYSTEM
Figure 2 is the currently the most commonly used ASR systems. DBN-DNN, which is based on
Karel’s implementation of DNN on KALDI [28] is used as baseline. DBN configuration in this experiment
uses 6-depth hidden layers with dimension of 2048 hidden neurons, i.e. Gaussian-Bernoulli RBM as first
layer connecting to the Gaussian acoustic inputs and Bernoulli-Bernoulli RBM layers afterward. The stack of
pre-training layers is followed by DNN layers with 1 hidden layers (1024 neurons) and softmax output layer.
We denote this as BASELINE2 in this paper.
We also train a conventional HMM-GMM. We denote this as BASELINE1. For this, we model each
digit with 16 states HMM, left-to-right where each state was modelled using Mixtures of Gaussian with the
number of Gaussian is three. For pause models: sil, we use HMM with 3 states with 6 Gaussian components.
Figure 3 is the proposed system in this study. We use MFCC as the basic for static features. We use
13 dimensions of static features and then the features are spliced by using 4 preceding and succeeding frames
to capture the context of the speech producing 117 dimensions. Then, we apply LDA to reduce the
dimensions into 40 dimensions for all features. Then we apply MLLT on the output of LDA before feeding
them into CNN. For systems using only LDA, we denote as PROPOSED1, and for system with both LDA
and MLLT is denoted as PROPOSED2.
Figure 2. The Baseline System: MFCC with delta transformation before feeding to DBN-DNN
Figure 3. The Proposed System: The Feature Transformation (LDA and MLLT) is applied on MFCC before
feeding it into 2-layers CNN. The output of CNN are feed into 4 layers DBN and 1 layer DNN
For CNN pre-training, we use two layers of CNN and then 4 layers of DBN. For CNN layers, we
use 128 neurons for first hidden layer and 256 neurons for second hidden. Pool size of three and max pooling
are used in this study. For DBN, we use standard 1024 neurons for hidden layers. This settings are the same
as in [15] as it is found a good setting for speech recognition. The output of DNN is used to estimate the
posterior probability of HMM states in hybrid of deep learning and HMM systems.
4. EXPERIMENTS
4.1. The Setup
The experiments are evaluated on speech corpus of isolated digit recognition task. We use TIDigits
corpus to train acoustic models on clean conditions, which consists of 8623 utterances pronounced by 111
male and 114 female adult speakers. For test data, we use the reverberant version of TIDigits, that is the
Meeting Recorder Digits (MRD) subset of Aurora-5 corpus [29]. The corpus comprises of real recording in
hands-free mode in the meeting room. The speech data is collected from 24 speakers at the International
Computer Science Institute in Barkeley, resulting of 2400 utterances for each microphone. The recording is
performed using four microphones (labeled as 6, 7, E, and F) which are placed at the middle of the table. The
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 5381 - 5388
5386
recording thus contain some reverberant acoustic conditions from the effect of hands-free recording in a
meeting room. The performance is measured using word error rate (WER).
4.2. Results and Discussion
Table 1 shows the evaluation of the proposed methods (PROPOSE1 and PROPOSE2).
As comparison, the performance of BASELINE1 and BASELINE2 are shown as well. The table clearly
indicates a consistent reduction of word error rates in reverberant conditions. PROPOSED1 achieves 37.67 %
and 27.78 % relative improvements over BASELINE1 and BASELINE2 respectively while PROPOSED2
achieves 38.69% and 28,94 % relative improvements over BASELINE1 and BASELINE2 respectively.
Applying MLLT after LDA (PROPOSED2) slightly better than applying LDA alone.
This might be analysed as the followings. When reverberation exists, the resulting reverberant
speech are a sum of the signal with the delayed version of the same signals. Therefore reverberant speech
contain the information from previous frames. This will increase correlations between neighboring frames,
hence may increase the local correlations nearby frames. In CNN architecture, the properties of locality,
convolution and pooling may be benefit in such conditions. Since the emphasis is on local neuron first, it can
learn on the local information and produce good features based on clean part of speech from early frames
(since they are relatively clean compared to late part of speech). When speech is corrupted by reveberation, It
may cause some frequency shift (delay in time-frequency domains).
These delays are difficult to handle within other models such as GMMs and DNNs, where many
Gaussians and hidden units are needed to be optimized for all possible pattern shifts [27]. With pooling
properties in CNN, the same feature value that calculated from different location is collected together and
indicated by a single value, which may be from the cleaner part of speech. Therefore, the differences in
features extracted by poling ply may minimized the effect of delay which are caused by reverberation. LDA
finds the features with the large variances and most separated means within the class. So, when it is used for
features, it is very likely to choose most distinguish spectra (the dominant spectra) which may contain the
phonemes information. When CNN is applied, due to the max-pooling, the information is maintained up to
the top layers, producing a more discriminative features and hence improving the performance.
Table 1. WER (%) of the Proposed Method in Comparison with the Baselines
Model
Conditions Average
Clean MRD 6 MRD 7 MRD E MRD F
BASELINE1 0.64 46.66 54.56 50.20 44.59 49.00
BASELIINE2 0.80 39.93 47.99 42.69 38.51 42.28
PROPOSE1 0.90 28.10 33.90 33.03 27.14 30.54
PROPOSE2 0.82 27.73 33.20 32.62 26.61 30.04
5. CONCLUSION
In this study, we evaluate the use of LDA and MLLT on CNN-based speech recognition to improve
the robustness of speech recognition against reverberation. Our experiments confirm that our proposed
method is more robust than standard DNN-HMM and HMM-GMM systems. The properties of weight
sharing, pooling, and locality of CNN, could improve the recognition accuracy on all transformed features
compared to the standard fully-connected DNN.
We need to state that the evaluated tasks are digit recognition tasks. Therefore, the long-term
dependency that exists in speech may not as significant as in continuous speech. Therefore it is interesting to
see how each architecture fare for continuous tasks. Since reverberation time is also heavily influenced by the
size of the room, it is also interesting to see how deep architectures perform in different settings of rooms.
This is our future plan.
REFERENCES
[1] M. L. Seltzer, D. Yu, and Y. Wang, “An investigation of deep neural networks for noise robust speech
recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013, pp. 7398–
7402.
[2] T. Yoshioka and M. J. Gales, “Environmentally robust asr front-end for deep neural network acoustic models,”
Computer Speech & Language, vol. 31, no. 1, pp. 65–86, 2015.
[3] R. Errattahi and A. El Hannani, “Recent advances in lvcsr: A benchmark comparison of performances,” Interna-
tional Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 6, pp. 3358–3368, 2017.
[4] M. F. Alghifari, T. S. Gunawan, and M. Kartiwi, “Speech emotion recognition using deep feedforward neural
network,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 10, no. 2, pp. 554–561, 2018.
7. Int J Elec & Comp Eng ISSN: 2088-8708
Convolutional Neural Network and Feature Transformation for Distant… (Hilman F. Pardede)
5387
[5] K. F. Akingbade, O. M. Umanna, and I. A. Alimi, “Voice-based door access control system using the mel
frequency cepstrum coefficients and gaussian mixture model,” International Journal of Electrical and Computer
Engineering, vol. 4, no. 5, p. 643, 2014.
[6] S. N. Endah, S. Adhy, and S. Sutikno, “Comparison of feature extraction mel frequency cepstral coefficients and
linear predictive coding in automatic speech recognition for indonesian,” TELKOMNIKA (Telecommunication
Computing Electronics and Control), vol. 15, no. 1, pp. 292–298, 2017.
[7] M. L. Seltzer, D. Yu, and Y. Wang, “An investigation of deep neural networks for noise robust speech
recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp.
7398–7402.
[8] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (nss), hidden markov models and the
projection, for robust speech recognition in cars,” Speech Communication, vol. 11, no. 2, pp. 215 – 228, 1992.
[9] C. Kim and R. M. Stern, “Power-normalized cepstral coefficients (pncc) for robust speech recognition,” IEEE/ACM
Trans. Audio, Speech and Lang. Proc., vol. 24, no. 7, pp. 1315–1329, Jul. 2016.
[10] M. J. F. Gales and S. J. Young, “Robust continuous speech recognition using parallel model combination,”
IEEETransactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352–359, Sep 1996.
[11] C. Weng, D. Yu, S. Watanabe, and B. H. F. Juang, “Recurrent deep neural networks for robust speech recognition,”
in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2014, pp. 5532–5536.
[12] Y. Zhang, W. Chan, and N. Jaitly, “Very deep convolutional networks for end-to-end speech recognition,” in 2017
IEEE International Conference on Acoustics, Speech and Signal Processing, March 2017, pp. 4845–4849.
[13] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A convolutional neural-network
approach,” IEEE transactions on neural networks, vol. 8, no. 1, pp. 98–113, 1997.
[14] P. Swietojanski, A. Ghoshal, and S. Renals, “Convolutional neural networks for distant speech recognition,” IEEE
Signal Processing Letters, vol. 21, no. 9, pp. 1120–1124, 2014.
[15] T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for
lvcsr,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8614–8618.
[16] T. N. Sainath, B. Kingsbury, A.-r. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B.
Ramabhadran, “Improvements to deep convolutional neural networks for lvcsr,” in 2013 IEEE Workshop on
Automatic Speech Recognition and Understanding (ASRU), 2013, pp. 315–320.
[17] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to
hybrid nn-hmm model for speech recognition,” in 2012 IEEE International Conference on Acoustics, Speech and
Signal Processing, 2012, pp. 4277–4280.
[18] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE trans-
actions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[19] R. Haeb-Umbach and H. Ney, “Linear discriminant analysis for improved large vocabulary continuous speech
recognition,” in 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, 1992, pp. 13–
16.
[20] L. Burget, “Combination of speech features using smoothed heteroscedastic linear discriminant analysis.” in
Interspeech, 2004.
[21] R. A. Gopinath, “Maximum likelihood modeling with gaussian distributions for classification,” in 1998 IEEE
International Conference on Acoustics, Speech and Signal Processing, 1998, pp. 661–664.
[22] D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig, “fmpe: Discriminatively trained features for
speech recognition,” in 2005 IEEE International Conference on Acoustics, Speech and Signal Processing, 2005,
pp. I–961.
[23] T. Yoshioka, A. Ragni, and M. J. Gales, “Investigation of unsupervised adaptation of dnn acoustic models with
filter bank input,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp.
6344–6348.
[24] S. Geirhofer, “Feature reduction with linear discriminant analysis and its performance on phoneme recognition,”
Department of Electrical and Computer Engineering: University of Illinois at Urbana-Champaign, 2004.
[25] M. Gales, “Maximum likelihood linear transformations for hmm-based speech recognition,” Computer Speech &
Language, vol. 12, no. 2, pp. 75 – 98, 1998.
[26] M. J. Gales, “Semi-tied covariance matrices for hidden markov models,” IEEE transactions on speech and audio
processing, vol. 7, no. 3, pp. 272–281, 1999.
[27] A.-H. Ossama, M. Abdel-rahman, J. Hui, D. Li, P. Gerald, and Y. Dong, “Convolutional neural networks for
speech recognition,” IEEE Signal Processing Magazine, vol. 22, no. 10, 2014.
[28] K. Vesel`y, A. Ghoshal, L. Burget, and D. Povey, “Sequence-discriminative training of deep neural networks.” in
INTERSPEECH, 2013, pp. 2345–2349.
[29] H. Hirsch, “Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a
hands-free speech input in noisy environments,” Niederrhein Univ. of Applied Sciences, 2007.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 5381 - 5388
5388
BIOGRAPHIES OF AUTHORS
Hilman Pardede is a researcher at Research Center for Informatics, Indonesian Institute of
Sciences. He obtained his Bachelor Degree in Electrical Engineering from University of
Indonesia in 2004 and Master of Engineering from the University of Western Australia in 2009.
He received his Doctor of Engineering from Tokyo Institute of Technology in 2013. He did a
postdoctoral at Fondazione Bruno Kessler in Trento Italy from 2013 to 2015. His research
interests include are speech recognition, pattern recognition, signal processing, machine learning
and artificial intelligence. He is an IEEE member and reviewer for Speech Communications
(Elsevier) and International Journal of Machine Learning and Cybernetics (Springer). He also
has served as reviewers in several international conferences.
Asri Rizki Yuliani is a researcher at Research Center for Informatics, Indonesian Institute of
Sciences. She earned bachelor degree in Computer Science from the University of Teknologi
Malaysia in 2009 and master degree in Information Management from Yuan Ze University in
2013. Her research interests include speech recognition, pattern recognition, and machine
learning.
Rika Sustika is a researcher at Research Center for Informatics, Indonesian Institute of Sciences
(LIPI). She earned bachelor and master degree in Electrical Engineering from Bandung Institute
of Technology (ITB). Her research interests are in the area of signal processing. Since January
2017 joined with machine learning research group. Interested for using deep learning on many
application such as on speech recognition, image recognition, and natural language processing.