This document provides an analytical review of feature extraction techniques for automatic speech recognition. It discusses several common feature extraction methods including mel spectral coefficients, cepstral transformation, and mel frequency cepstral coefficients (MFCC). MFCC are widely used in speech recognition as they reflect the human auditory perception and produce de-correlated coefficients. The document also covers vector space representation of features and different distance metrics like Euclidean, city block, and weighted Euclidean that can be used for classification of unknown vectors.
Introduction to multiple signal classifier (music)Milkessa Negeri
This document provides an introduction to the MUSIC algorithm, which is used to estimate the frequency content of a signal or autocorrelation matrix using an eigenspace method. It assumes a signal consists of complex exponentials in noise. MUSIC is a high-resolution algorithm that uses the eigenvectors of the autocorrelation matrix to separate the signal and noise subspaces. The document also describes how MUSIC can be used for adaptive beamforming to enhance a desired signal while suppressing interference using an array of sensors. It compares MUSIC to the ESPRIT algorithm for direction of arrival estimation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Direction of arrival estimation using music algorithmeSAT Journals
Abstract The performance of smart antenna greatly depends on the effectiveness of DOA estimation algorithm. This paper analyzed the performance of MUSIC (Multiple Signal Classification) algorithm for DOA estimation. simulation results shows that MUSIC provide better angular resolution for increasing number of array element, distance between array element and number of samples. All the simulations are carried out using MATLAB. Keywords: DOA (Direction of arrival), MUSIC (Multiple signal classification), ULA(Uniform linear array)
Subspace based doa estimation techniqueseSAT Journals
Abstract The subspace based techniques are used for Direction of Arrival (DOA) estimation in this work. The subspace based techniques are based on using the eigen structure of data covariance matrix. The subspace based techniques includes MUSIC, ROOT-MUSIC, ESPRIT. The aim is to analyze the performance of DOA estimation algorithms in challenging environment, such as low signal to noise ratio, closely spaced sources. The performance of subspace based DOA estimation algorithm is done on Uniform Linear Array (ULA). Simulation result shows the effect of varying parameters will affect DOA estimation. The simulation shows that the MUSIC algorithm has better accuracy as compared to the Root-MUSIC and ESPRIT. Keywords: DOA, MUSIC, Root-MUSIC, and ESPRIT
The document discusses convolution and its applications in digital signal processing. It begins with an introduction to convolution and its mathematical definitions for both continuous and discrete time signals. It then discusses various types of convolution including linear and circular convolution. The properties of convolution such as commutativity, associativity and distributivity are also covered. Applications of convolution in areas such as statistics, optics, acoustics, electrical engineering and digital signal processing are summarized. Finally, the document discusses symmetric convolution and its advantages over traditional convolution methods.
Several methods have been proposed to approximate the sum of correlated lognormal RVs.
However the accuracy of each method relies highly on the region of the resulting distribution
being examined, and the individual lognormal parameters, i.e., mean and variance. There is no
such method which can provide the needed accuracy for all cases. This paper propose a
universal yet very simple approximation method for the sum of correlated lognormals based on
log skew normal approximation. The main contribution on this work is to propose an analytical
method for log skew normal parameters estimation. The proposed method provides highly
accurate approximation to the sum of correlated lognormal distributions over the whole range
of dB spreads for any correlation coefficient. Simulation results show that our method
outperforms all previously proposed methods and provides an accuracy within 0.01 dB for all
cases.
This document discusses energy detection of unknown signals in fading environments. It proposes modeling the received signal power distribution under combined slow and fast fading. This allows deriving the distribution of the detector's decision variable in closed form. Specifically:
1) It models the received signal as the sum of the signal and noise, scaled by a complex channel amplitude representing fast and slow fading.
2) It derives an expression for the sufficient statistic at the detector's output and simplifies it under assumptions of high sample numbers and independent samples.
3) It expresses the distribution of the decision variable as an integral of the distribution for a fixed SNR, averaged over the SNR distribution due to fading.
4) It provides the specific
The document describes a Wiener filtering algorithm for noise suppression in speech signals. It involves estimating the noise and speech power spectral densities (PSDs) using a noise model and all-pole modeling of speech respectively. An iterative Wiener filter is then constructed using the PSD estimates. The algorithm is improved by adding a voice activity detector to estimate noise PSD only from non-speech frames. Evaluation shows the denoised speech has higher intelligibility and a posteriori SNR compared to noisy speech.
Introduction to multiple signal classifier (music)Milkessa Negeri
This document provides an introduction to the MUSIC algorithm, which is used to estimate the frequency content of a signal or autocorrelation matrix using an eigenspace method. It assumes a signal consists of complex exponentials in noise. MUSIC is a high-resolution algorithm that uses the eigenvectors of the autocorrelation matrix to separate the signal and noise subspaces. The document also describes how MUSIC can be used for adaptive beamforming to enhance a desired signal while suppressing interference using an array of sensors. It compares MUSIC to the ESPRIT algorithm for direction of arrival estimation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Direction of arrival estimation using music algorithmeSAT Journals
Abstract The performance of smart antenna greatly depends on the effectiveness of DOA estimation algorithm. This paper analyzed the performance of MUSIC (Multiple Signal Classification) algorithm for DOA estimation. simulation results shows that MUSIC provide better angular resolution for increasing number of array element, distance between array element and number of samples. All the simulations are carried out using MATLAB. Keywords: DOA (Direction of arrival), MUSIC (Multiple signal classification), ULA(Uniform linear array)
Subspace based doa estimation techniqueseSAT Journals
Abstract The subspace based techniques are used for Direction of Arrival (DOA) estimation in this work. The subspace based techniques are based on using the eigen structure of data covariance matrix. The subspace based techniques includes MUSIC, ROOT-MUSIC, ESPRIT. The aim is to analyze the performance of DOA estimation algorithms in challenging environment, such as low signal to noise ratio, closely spaced sources. The performance of subspace based DOA estimation algorithm is done on Uniform Linear Array (ULA). Simulation result shows the effect of varying parameters will affect DOA estimation. The simulation shows that the MUSIC algorithm has better accuracy as compared to the Root-MUSIC and ESPRIT. Keywords: DOA, MUSIC, Root-MUSIC, and ESPRIT
The document discusses convolution and its applications in digital signal processing. It begins with an introduction to convolution and its mathematical definitions for both continuous and discrete time signals. It then discusses various types of convolution including linear and circular convolution. The properties of convolution such as commutativity, associativity and distributivity are also covered. Applications of convolution in areas such as statistics, optics, acoustics, electrical engineering and digital signal processing are summarized. Finally, the document discusses symmetric convolution and its advantages over traditional convolution methods.
Several methods have been proposed to approximate the sum of correlated lognormal RVs.
However the accuracy of each method relies highly on the region of the resulting distribution
being examined, and the individual lognormal parameters, i.e., mean and variance. There is no
such method which can provide the needed accuracy for all cases. This paper propose a
universal yet very simple approximation method for the sum of correlated lognormals based on
log skew normal approximation. The main contribution on this work is to propose an analytical
method for log skew normal parameters estimation. The proposed method provides highly
accurate approximation to the sum of correlated lognormal distributions over the whole range
of dB spreads for any correlation coefficient. Simulation results show that our method
outperforms all previously proposed methods and provides an accuracy within 0.01 dB for all
cases.
This document discusses energy detection of unknown signals in fading environments. It proposes modeling the received signal power distribution under combined slow and fast fading. This allows deriving the distribution of the detector's decision variable in closed form. Specifically:
1) It models the received signal as the sum of the signal and noise, scaled by a complex channel amplitude representing fast and slow fading.
2) It derives an expression for the sufficient statistic at the detector's output and simplifies it under assumptions of high sample numbers and independent samples.
3) It expresses the distribution of the decision variable as an integral of the distribution for a fixed SNR, averaged over the SNR distribution due to fading.
4) It provides the specific
The document describes a Wiener filtering algorithm for noise suppression in speech signals. It involves estimating the noise and speech power spectral densities (PSDs) using a noise model and all-pole modeling of speech respectively. An iterative Wiener filter is then constructed using the PSD estimates. The algorithm is improved by adding a voice activity detector to estimate noise PSD only from non-speech frames. Evaluation shows the denoised speech has higher intelligibility and a posteriori SNR compared to noisy speech.
This document provides an introduction to equalization and summarizes several equalization techniques:
1) Zero forcing equalizers aim to completely eliminate intersymbol interference by inverting the channel response but can amplify noise.
2) The mean square error criterion aims to minimize the error between the received and desired signals when filtered by the equalizer. This can be solved using least squares or adaptive algorithms like LMS.
3) The least mean square algorithm approximates the steepest descent method to iteratively and adaptively update the equalizer filter taps to minimize the mean square error based only on instantaneous measurements. This makes it suitable for time-varying channels.
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
In the field of speech signal processing, Spectral subtraction method (SSM) has been successfully implemented to suppress the noise that is added acoustically. SSM does reduce the noise at satisfactory level but musical noise is a major drawback of this method. To implement spectral subtraction method, transformation of speech signal from time domain to frequency domain is required. On the other hand, Wavelet transform displays another aspect of speech signal. In this paper we have applied a new approach in which SSM is cascaded with wavelet thresholding technique (WTT) for improving the quality of speech signal by removing the problem of musical noise to a great extent. Results of this proposed system have been simulated on MATLAB.
Blind equalization is a digital signal processing technique in which the transmitted signal is inferred (equalized) from the received signal, while making use only of the transmitted signal statistics. Hence, the use of the word blind in the name.
The document describes the backpropagation algorithm for training multilayer neural networks. It discusses how backpropagation uses gradient descent to minimize error between network outputs and targets by calculating error gradients with respect to weights. The algorithm iterates over examples, calculates error, computes gradients to update weights. Momentum can be added to help escape local minima. Backpropagation can learn representations in hidden layers and is prone to overfitting without validation data to select the best model.
This document provides an overview of equalizer design in digital communication systems. It discusses the need for equalization to address inter-symbol interference caused by channel limitations. It describes two main equalizer designs: zero-forcing equalizers that apply the inverse channel response and minimum mean square error equalizers that minimize the error between the equalized signal and desired signal. It explains how the tap coefficients of these equalizers can be calculated using linear algebra methods like solving sets of equations. The document concludes by noting that equalization is a key technique in modern communications to compensate for channel distortions.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document provides an introduction to discrete time signals and systems. It defines discrete time signals as functions of integer time and discusses various representations including graphical, functional, tabular, and sequence representations. It then describes common elementary discrete time signals like impulse, step, ramp, and exponential signals. The document also classifies signals as energy/power signals, periodic/aperiodic, even/odd, and discusses basic manipulations. Finally, it defines discrete time systems as those that transform an input signal to an output signal according to a rule and classifies systems as static/dynamic and with finite/infinite memory.
The document summarizes a study on fractal image compression of satellite images using range and domain techniques. It discusses fractal image compression methods, including partitioning images into range and domain blocks. Affine transformations are applied to domain blocks to match range blocks. Peak signal-to-noise ratio (PSNR) values are calculated for reconstructed rural and urban satellite images after 4 iterations, showing PSNR of around 17.0 for rural images and 22.0 for urban images. The proposed algorithm partitions the original image into non-overlapping range blocks and selects domain blocks twice the size of range blocks.
This document discusses adaptive equalization techniques used in wireless communications. It begins by describing different types of interference such as co-channel, adjacent channel, and inter-symbol interference that affect wireless transmissions. Equalization is introduced as a technique to counter inter-symbol interference by concentrating dispersed symbol energy back into its time interval. Adaptive equalization is specifically discussed as it can track time-varying mobile channel characteristics using algorithms like zero forcing, least mean squares, and recursive least squares. The key components of an adaptive equalizer including its operating modes in training and tracking are also outlined.
The document discusses decision tree learning and the ID3 algorithm. It begins by introducing decision trees and how they are used to classify instances by sorting them from the root node to a leaf node. It then discusses how ID3 builds decision trees in a top-down greedy manner by selecting the attribute that best splits the data at each node based on information gain. The document also covers issues like overfitting, handling continuous attributes, and pruning decision trees.
Machine learning is presented by Pranay Rajput. The agenda includes an introduction to machine learning, basics, classification, regression, clustering, distance metrics, and use cases. ML allows computer programs to learn from experience to improve performance on tasks. Supervised learning predicts labels or targets while unsupervised learning finds hidden patterns in unlabeled data. Popular algorithms include classification, regression, and clustering. Classification predicts class labels, regression predicts continuous values, and clustering groups similar data points. Distance metrics like Euclidean, Manhattan, and cosine are used in ML models to measure similarity between data points. Common applications involve recommendation systems, computer vision, natural language processing, and fraud detection. Popular frameworks for ML include scikit-learn, TensorFlow, Keras
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
In this paper, a phase space reconstruction-based method is proposed for speech enhancement. The method embeds the noisy signal into a high dimensional reconstructed phase space and uses Spectral Subtraction idea. The advantages of the proposed method are fast performance, high SNR and good MOS. In order to evaluate the proposed method, ten signals of TIMIT database mixed with the white additive Gaussian noise and then the method was implemented. The efficiency of the proposed method was evaluated by using qualitative and quantitative criteria.
Reducting Power Dissipation in Fir Filter: an AnalysisCSCJournals
This document summarizes and analyzes three existing techniques for reducing power consumption in FIR filters: signed power-of-two representation, steepest descent optimization, and coefficient segmentation. It finds that steepest descent can reduce hamming distance between coefficients by up to 26%, while coefficient segmentation can achieve up to 47% reduction. However, both techniques degrade filter performance parameters slightly. Signed power-of-two representation provides the most power reduction of 63% but introduces overhead from additional adders and shifters. The document evaluates these techniques on four low-pass FIR filters and concludes there is a tradeoff between hamming distance reduction and degradation of filter specifications.
The document summarizes key concepts in equalization and diversity techniques used in mobile communication systems. It discusses linear equalizers like transversal filters and lattice filters. Nonlinear equalizers covered include decision feedback equalization (DFE) and maximum likelihood sequence estimation (MLSE). DFE uses a feedforward filter and feedback filter to cancel intersymbol interference. MLSE estimates sequences using a trellis channel model and the Viterbi algorithm. Diversity techniques like spatial, frequency and time diversity are also introduced to mitigate fading effects.
Space-efficient Approximation Scheme for Maximum Matching in Sparse Graphscseiitgn
This document describes a space efficient approximation scheme for maximum matching in sparse graphs. It begins with an introduction to matching problems and Baker's algorithm for approximating problems on planar graphs. It notes that computing distances is difficult in logspace for planar graphs. The document then outlines previous work on matching algorithms and complexity, and states that the goal is to obtain an approximation scheme for maximum matching that runs in logspace.
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
voice activity detector (VAD) is used to separate the speech data included parts from silence parts of the signal. In this paper a new VAD algorithm is represented on the basis of singular value decomposition. There are two sections to perform the feature vector extraction. In first section voiced frames are separated from unvoiced and silence frames. In second section unvoiced frames are silence frames. To perform the above sections, first, windowing the noisy signal then Hankel’s matrix is formed for each frame. The basis of statistical feature extraction of purposed system is slope of singular value curve related to each frame by using linear regression. It is shown that the slope of singular values curve per different SNRs in voiced frames is more than the other types and this property can be to achieve the goal the first part can be used. High similarity between feature vector of unvoiced and silence frame caused to approach for separation of the two categories above cannot be used. So in the second part, the frequency characteristics for identification of unvoiced frames from silent frames have been used. Simulation results show that high speed and accuracy are the advantages of the proposed system.
The document discusses developing a low-pass filter for reducing noise in spectral data, specifically analyzing spectra of supernova 2011fe. Several filtering methods are explored, including running average, Savitzky-Golay filtering, discrete Fourier transforms (DFT, DST, DCT), and Wiener filtering. Wiener filtering is determined to be most effective by creating an optimal filter from estimates of signal and noise contributions. A Mathematica program is created to take a spectrum, apply the selected filtering method, and output/plot the filtered spectrum for comparison to the original.
The document discusses the Widrow-Hoff learning rule and the LMS algorithm. It describes how the LMS algorithm uses an approximate steepest descent method to minimize the mean square error of an adaptive linear neuron. It also discusses conditions for stability and convergence of the algorithm, providing examples of using it for tasks like noise cancellation.
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...IJTET Journal
An efficient architecture for the implementation of a delayed least mean square adaptive filter. A Novel
partial product Generator is achieving lower adaptation-delay and Area delay consumption and propose a strategy
for optimized balanced pipelining across the time-consuming combinational blocks of the structure. From synthesis
results, the proposed design will offers less area-delay product (ADP) the best of the existing systolic structures, on
average, for filter lengths N =8, 16, and 32. An efficient fixed-point implementation scheme of the proposed
architecture, The EEG(electroencephalogram) is used for recording of electrical activity of the brain .During
recording the EEG is contaminated by various artifacts as PLI(Power line interference), MA(Muscle artifact),
EBA(Eye blink artifact). This paper gives Detail of various artifacts which occur in EEG signal. In this we study
adaptive filter for reducing the EBA (eye blink artifact) noise from the EEG signal and to increase SNR (Signal to
noise ratio).the analytical result matches with the simulation result is showed.
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...IOSR Journals
This document discusses a study that uses self-organizing maps (SOM) and technical indicators to identify stocks with potential for investment gains. The study selects stocks and compares their returns over 1.5 months to the NIFTY index. The stocks identified using SOM and technical indicators performed 37.14% better than the NIFTY index over that period. The document provides background on technical analysis indicators like RSI, MACD, and OBV that were used in the analysis. It also describes how SOM can be used to classify stocks based on technical indicator values and select stocks that closely match the properties of the best performing class.
Comparison of 60GHz CSRRs Ground Shield and Patterned Ground Shield On-chip B...IOSR Journals
This document compares two 60GHz on-chip bandpass filters designed using a 0.18μm CMOS technology. The first filter uses complementary split ring resonators (CSRRs) as the ground shield, while the second uses a patterned ground shield. Simulation results show that the CSRR ground shield filter has slightly lower insertion loss of -2.682dB, a narrower 3dB bandwidth of 10.8GHz, and a smaller chip size of 0.651mm^2. The patterned ground shield filter has a higher insertion loss of -2.77dB but a wider 3dB bandwidth of 14GHz and slightly smaller chip size of 0.527mm^2. Both filters demonstrate good return loss and
This document proposes an automated micro-controller based system to manage room lighting energy using dimmable CFL ballasts that considers daylight penetration. Key aspects of the system include:
1) Using a developed 36W CFL dimming ballast that can be PWM controlled to vary the light output and reduce energy consumption.
2) Designing the system for a single room with four CFL lights and zone-based lighting controllers.
3) Sensing the light level in each zone using a photo sensor and controlling the dimming ballast to adjust the electric light contribution as needed to maintain the required light level based on daylight contribution.
4) Gradually increasing or decreasing the electric light output over time
This document provides an introduction to equalization and summarizes several equalization techniques:
1) Zero forcing equalizers aim to completely eliminate intersymbol interference by inverting the channel response but can amplify noise.
2) The mean square error criterion aims to minimize the error between the received and desired signals when filtered by the equalizer. This can be solved using least squares or adaptive algorithms like LMS.
3) The least mean square algorithm approximates the steepest descent method to iteratively and adaptively update the equalizer filter taps to minimize the mean square error based only on instantaneous measurements. This makes it suitable for time-varying channels.
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
In the field of speech signal processing, Spectral subtraction method (SSM) has been successfully implemented to suppress the noise that is added acoustically. SSM does reduce the noise at satisfactory level but musical noise is a major drawback of this method. To implement spectral subtraction method, transformation of speech signal from time domain to frequency domain is required. On the other hand, Wavelet transform displays another aspect of speech signal. In this paper we have applied a new approach in which SSM is cascaded with wavelet thresholding technique (WTT) for improving the quality of speech signal by removing the problem of musical noise to a great extent. Results of this proposed system have been simulated on MATLAB.
Blind equalization is a digital signal processing technique in which the transmitted signal is inferred (equalized) from the received signal, while making use only of the transmitted signal statistics. Hence, the use of the word blind in the name.
The document describes the backpropagation algorithm for training multilayer neural networks. It discusses how backpropagation uses gradient descent to minimize error between network outputs and targets by calculating error gradients with respect to weights. The algorithm iterates over examples, calculates error, computes gradients to update weights. Momentum can be added to help escape local minima. Backpropagation can learn representations in hidden layers and is prone to overfitting without validation data to select the best model.
This document provides an overview of equalizer design in digital communication systems. It discusses the need for equalization to address inter-symbol interference caused by channel limitations. It describes two main equalizer designs: zero-forcing equalizers that apply the inverse channel response and minimum mean square error equalizers that minimize the error between the equalized signal and desired signal. It explains how the tap coefficients of these equalizers can be calculated using linear algebra methods like solving sets of equations. The document concludes by noting that equalization is a key technique in modern communications to compensate for channel distortions.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document provides an introduction to discrete time signals and systems. It defines discrete time signals as functions of integer time and discusses various representations including graphical, functional, tabular, and sequence representations. It then describes common elementary discrete time signals like impulse, step, ramp, and exponential signals. The document also classifies signals as energy/power signals, periodic/aperiodic, even/odd, and discusses basic manipulations. Finally, it defines discrete time systems as those that transform an input signal to an output signal according to a rule and classifies systems as static/dynamic and with finite/infinite memory.
The document summarizes a study on fractal image compression of satellite images using range and domain techniques. It discusses fractal image compression methods, including partitioning images into range and domain blocks. Affine transformations are applied to domain blocks to match range blocks. Peak signal-to-noise ratio (PSNR) values are calculated for reconstructed rural and urban satellite images after 4 iterations, showing PSNR of around 17.0 for rural images and 22.0 for urban images. The proposed algorithm partitions the original image into non-overlapping range blocks and selects domain blocks twice the size of range blocks.
This document discusses adaptive equalization techniques used in wireless communications. It begins by describing different types of interference such as co-channel, adjacent channel, and inter-symbol interference that affect wireless transmissions. Equalization is introduced as a technique to counter inter-symbol interference by concentrating dispersed symbol energy back into its time interval. Adaptive equalization is specifically discussed as it can track time-varying mobile channel characteristics using algorithms like zero forcing, least mean squares, and recursive least squares. The key components of an adaptive equalizer including its operating modes in training and tracking are also outlined.
The document discusses decision tree learning and the ID3 algorithm. It begins by introducing decision trees and how they are used to classify instances by sorting them from the root node to a leaf node. It then discusses how ID3 builds decision trees in a top-down greedy manner by selecting the attribute that best splits the data at each node based on information gain. The document also covers issues like overfitting, handling continuous attributes, and pruning decision trees.
Machine learning is presented by Pranay Rajput. The agenda includes an introduction to machine learning, basics, classification, regression, clustering, distance metrics, and use cases. ML allows computer programs to learn from experience to improve performance on tasks. Supervised learning predicts labels or targets while unsupervised learning finds hidden patterns in unlabeled data. Popular algorithms include classification, regression, and clustering. Classification predicts class labels, regression predicts continuous values, and clustering groups similar data points. Distance metrics like Euclidean, Manhattan, and cosine are used in ML models to measure similarity between data points. Common applications involve recommendation systems, computer vision, natural language processing, and fraud detection. Popular frameworks for ML include scikit-learn, TensorFlow, Keras
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
In this paper, a phase space reconstruction-based method is proposed for speech enhancement. The method embeds the noisy signal into a high dimensional reconstructed phase space and uses Spectral Subtraction idea. The advantages of the proposed method are fast performance, high SNR and good MOS. In order to evaluate the proposed method, ten signals of TIMIT database mixed with the white additive Gaussian noise and then the method was implemented. The efficiency of the proposed method was evaluated by using qualitative and quantitative criteria.
Reducting Power Dissipation in Fir Filter: an AnalysisCSCJournals
This document summarizes and analyzes three existing techniques for reducing power consumption in FIR filters: signed power-of-two representation, steepest descent optimization, and coefficient segmentation. It finds that steepest descent can reduce hamming distance between coefficients by up to 26%, while coefficient segmentation can achieve up to 47% reduction. However, both techniques degrade filter performance parameters slightly. Signed power-of-two representation provides the most power reduction of 63% but introduces overhead from additional adders and shifters. The document evaluates these techniques on four low-pass FIR filters and concludes there is a tradeoff between hamming distance reduction and degradation of filter specifications.
The document summarizes key concepts in equalization and diversity techniques used in mobile communication systems. It discusses linear equalizers like transversal filters and lattice filters. Nonlinear equalizers covered include decision feedback equalization (DFE) and maximum likelihood sequence estimation (MLSE). DFE uses a feedforward filter and feedback filter to cancel intersymbol interference. MLSE estimates sequences using a trellis channel model and the Viterbi algorithm. Diversity techniques like spatial, frequency and time diversity are also introduced to mitigate fading effects.
Space-efficient Approximation Scheme for Maximum Matching in Sparse Graphscseiitgn
This document describes a space efficient approximation scheme for maximum matching in sparse graphs. It begins with an introduction to matching problems and Baker's algorithm for approximating problems on planar graphs. It notes that computing distances is difficult in logspace for planar graphs. The document then outlines previous work on matching algorithms and complexity, and states that the goal is to obtain an approximation scheme for maximum matching that runs in logspace.
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
voice activity detector (VAD) is used to separate the speech data included parts from silence parts of the signal. In this paper a new VAD algorithm is represented on the basis of singular value decomposition. There are two sections to perform the feature vector extraction. In first section voiced frames are separated from unvoiced and silence frames. In second section unvoiced frames are silence frames. To perform the above sections, first, windowing the noisy signal then Hankel’s matrix is formed for each frame. The basis of statistical feature extraction of purposed system is slope of singular value curve related to each frame by using linear regression. It is shown that the slope of singular values curve per different SNRs in voiced frames is more than the other types and this property can be to achieve the goal the first part can be used. High similarity between feature vector of unvoiced and silence frame caused to approach for separation of the two categories above cannot be used. So in the second part, the frequency characteristics for identification of unvoiced frames from silent frames have been used. Simulation results show that high speed and accuracy are the advantages of the proposed system.
The document discusses developing a low-pass filter for reducing noise in spectral data, specifically analyzing spectra of supernova 2011fe. Several filtering methods are explored, including running average, Savitzky-Golay filtering, discrete Fourier transforms (DFT, DST, DCT), and Wiener filtering. Wiener filtering is determined to be most effective by creating an optimal filter from estimates of signal and noise contributions. A Mathematica program is created to take a spectrum, apply the selected filtering method, and output/plot the filtered spectrum for comparison to the original.
The document discusses the Widrow-Hoff learning rule and the LMS algorithm. It describes how the LMS algorithm uses an approximate steepest descent method to minimize the mean square error of an adaptive linear neuron. It also discusses conditions for stability and convergence of the algorithm, providing examples of using it for tasks like noise cancellation.
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...IJTET Journal
An efficient architecture for the implementation of a delayed least mean square adaptive filter. A Novel
partial product Generator is achieving lower adaptation-delay and Area delay consumption and propose a strategy
for optimized balanced pipelining across the time-consuming combinational blocks of the structure. From synthesis
results, the proposed design will offers less area-delay product (ADP) the best of the existing systolic structures, on
average, for filter lengths N =8, 16, and 32. An efficient fixed-point implementation scheme of the proposed
architecture, The EEG(electroencephalogram) is used for recording of electrical activity of the brain .During
recording the EEG is contaminated by various artifacts as PLI(Power line interference), MA(Muscle artifact),
EBA(Eye blink artifact). This paper gives Detail of various artifacts which occur in EEG signal. In this we study
adaptive filter for reducing the EBA (eye blink artifact) noise from the EEG signal and to increase SNR (Signal to
noise ratio).the analytical result matches with the simulation result is showed.
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...IOSR Journals
This document discusses a study that uses self-organizing maps (SOM) and technical indicators to identify stocks with potential for investment gains. The study selects stocks and compares their returns over 1.5 months to the NIFTY index. The stocks identified using SOM and technical indicators performed 37.14% better than the NIFTY index over that period. The document provides background on technical analysis indicators like RSI, MACD, and OBV that were used in the analysis. It also describes how SOM can be used to classify stocks based on technical indicator values and select stocks that closely match the properties of the best performing class.
Comparison of 60GHz CSRRs Ground Shield and Patterned Ground Shield On-chip B...IOSR Journals
This document compares two 60GHz on-chip bandpass filters designed using a 0.18μm CMOS technology. The first filter uses complementary split ring resonators (CSRRs) as the ground shield, while the second uses a patterned ground shield. Simulation results show that the CSRR ground shield filter has slightly lower insertion loss of -2.682dB, a narrower 3dB bandwidth of 10.8GHz, and a smaller chip size of 0.651mm^2. The patterned ground shield filter has a higher insertion loss of -2.77dB but a wider 3dB bandwidth of 14GHz and slightly smaller chip size of 0.527mm^2. Both filters demonstrate good return loss and
This document proposes an automated micro-controller based system to manage room lighting energy using dimmable CFL ballasts that considers daylight penetration. Key aspects of the system include:
1) Using a developed 36W CFL dimming ballast that can be PWM controlled to vary the light output and reduce energy consumption.
2) Designing the system for a single room with four CFL lights and zone-based lighting controllers.
3) Sensing the light level in each zone using a photo sensor and controlling the dimming ballast to adjust the electric light contribution as needed to maintain the required light level based on daylight contribution.
4) Gradually increasing or decreasing the electric light output over time
This document summarizes a research paper that proposes using a genetic algorithm to optimize the placement of Fiber Bragg Grating sensors for structural health monitoring. It begins by introducing FBG sensors and the need for sensor placement optimization when resources are limited. It then provides background on genetic algorithms and describes how they can be applied to the sensor placement problem by coding sensor locations into chromosomes, evaluating fitness, and using genetic operators like selection, crossover and mutation to evolve optimized sensor configurations. The document explains the genetic algorithm approach in detail through sections on coding, initial population, selection, crossover and mutation operations.
This document proposes a unified approach to refine measures of central tendency and dispersion. It defines a generalized measure of central tendency as the value that minimizes the deviation between a point and a dataset. Various common measures of central tendency like mean, median, mode, geometric mean and harmonic mean are derived as special cases of this generalized definition. The concept is extended to introduce an "interval of central tendency" and methods to estimate it. Simulation studies show the interval of central tendency can capture more observations than a single point estimate, and allow comparison of different measures. The approach is also applied to derive confidence intervals for the population mean and probability of success in Bernoulli trials.
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is an open access international journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
1) The document studies the impact of emulsified water-diesel mixtures on engine performance and emissions. Diesel was emulsified with 3%, 5%, and 7% water by volume.
2) Test results showed that adding water to diesel emulsions improved brake thermal efficiency and reduced brake specific fuel consumption compared to pure diesel. Emissions of nitrogen oxides and particulate matter also decreased with the addition of up to 5% water.
3) The presence of water in the emulsions lowers combustion temperatures, reducing nitrogen oxide emissions. It also increases the expansion work and reduces compression work in the engine, improving efficiency.
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
This document proposes an ontology and text mining technique to select research papers. It involves 3 phases: 1) constructing a research ontology using keywords and frequencies from past papers, 2) classifying new papers based on ontology keywords, and 3) clustering papers in each domain using text mining and the K-means algorithm. The technique aims to better group papers and assign them to relevant reviewers by addressing limitations of keyword-based methods. It constructs a research ontology, classifies papers, clusters them based on textual similarities, and systematically assigns papers to reviewers.
The document discusses an improved method for storing feature vectors to detect Android malware. It proposes using a compressed row storage format to efficiently store the statistical features that represent malware families. This involves storing only the non-zero elements of sparse feature matrices in three vectors, which reduces storage needs by 79% compared to conventional methods. This improved storage technique leads to reduced processing time for feature vector generation and malware detection overall. The proposed method aims to enhance Android malware analysis by making feature vector searches and classification faster.
This document provides an overview of cloud computing, including its key characteristics, service models, deployment models, examples, advantages and limitations. Specifically, it defines cloud computing as the delivery of computing resources such as servers, storage, databases and software over the internet. It describes the main service models of software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS). It also outlines the deployment models of public, private and hybrid clouds and discusses some advantages like scalability, cost savings and disadvantages like security issues and dependence on internet connectivity.
This document describes a numerical model developed to simulate tsunami propagation and inundation in the coastal region of Bangladesh. Six potential tsunami scenarios were identified based on earthquake sources in the Bay of Bengal. Initial surface level maps were generated for each scenario using a geological model. The tsunami model was developed using the MIKE21 modeling system with nested grids down to 600m resolution covering the coastal region. The model was calibrated with the 2004 Sumatra tsunami and then simulated each scenario to generate maximum inundation maps. An inundation risk map was then produced combining the results, showing high risk areas like Sundarbans, Nijhum Dwip, and Cox's Bazaar coast.
This document discusses the design and analysis of a digital down converter (DDC) for WiMAX applications using MATLAB. It contains the following key points:
1. It describes the functional blocks and design of a DDC, including a mixer, numerically controlled oscillator (NCO), and FIR filter chain.
2. It discusses WiMAX standards and requirements for DDC design in WiMAX systems.
3. It presents the windowing technique for designing FIR filters and compares different window functions to determine the best filter specifications.
This document summarizes a proposed passive image forensic method to detect resampling forgery in digital images. Resampling is a common operation used in image forgeries to resize or rotate image regions. The proposed method detects periodic correlations introduced during resampling. It uses a k-nearest neighbors algorithm and support vector machine classifier to identify periodicity maps of resampled images. Experimental results on test images show the method achieves high recall and precision rates when detecting resampled regions, outperforming conventional techniques. The method provides a way to detect image manipulations involving resampling without requiring pre-embedded signatures in images.
The document describes a study that investigated using ultrasound as an alternative to traps or poison for repelling and eradicating rodents. Tests were conducted using an ultrasonic repeller on three rodent species: rats, mice, and rabbits. The rodents were observed from a distance when introduced to ultrasound frequencies between 35-50 kHz generated by the repeller. They appeared irritated and tried to escape, showing ultrasound can repel rodents. When caged near the repeller, the rodents became irritated and immobilized. The study concludes ultrasound provides an effective and safer alternative to traditional rodent control methods.
This document analyzes and surveys trust in cloud computing environments. It discusses the need for trust between cloud service users and providers. Several trust issues are identified, including access by insiders, nesting of services, degree of exposure, and risk. Various existing trust models are examined, including those based on service level agreements, virtual environments, and propositional logic terms. User behavior-based trust models are focused on, as it is important to evaluate the trustworthiness of both users and providers. The models are found to be lacking in transparency and parameters for calculating trust values. Overall, the document aims to address trust issues and analyze existing trust models for cloud services.
This document provides a comparative perspective on the postgraduate historical research formats of Sudan and Nigeria. Some key differences discussed include:
- In Sudan, original certificates must be authenticated for admission, while this is not required in Nigeria.
- In Nigeria, abstracts cannot be more than one or in another language, while in Sudan abstracts must be provided in both English and Arabic.
- Sudan requires explicitly stating research results and recommendations in the text, while this is not mandatory in Nigeria where historians are not expected to speculate or pass judgment.
- Overall, the document finds some similarities in research stages and requirements but also noteworthy differences in the admission processes and abstract/language requirements between the two countries.
This document summarizes a study on the impact of emotion on prosody analysis in speech. The study analyzed speech samples recorded from actors expressing different emotions like love, anger, calm, sadness and neutral. It measured acoustic parameters like vowel duration, fundamental frequency, jitter and shimmer for the different emotions. The results showed that speech expressing love had longer vowel durations, while sad speech had longer durations for certain vowels. This indicates emotion impacts prosodic features of speech, which is important for applications like speech recognition and synthesis systems.
This document summarizes a mobile app called "I Safe Apps" that is designed to enhance women's safety. The app allows women to alert emergency contacts by pushing a single button, which shares the woman's location and sends an alert. It also provides first aid information and instructions. The app aims to address issues like sexual assault and violence against women by allowing easy access to safety features from a mobile device. It stores emergency contact information and sends alerts to those contacts in the event the SOS button is pressed, helping connect women in distress to help.
This document proposes a methodology for forensic investigation of criminal activity on multitenant cloud web hosting platforms. It presents challenges of investigating crimes in such an environment due to lack of access to physical servers and scattered data across multiple virtual machines and data centers. The proposed architecture uses a MAC Address Derivation Algorithm (MADA) to trace the physical network location of criminals by identifying their IP and MAC addresses from DHCP and firewall logs. When illegal access is detected, the process generates an investigation report by applying MADA to obtain location details based on the user ID of the unauthorized access. This helps law enforcement agencies further investigate cybercrimes on multitenant cloud systems.
This document provides a theoretical investigation of a solar energy driven combined power and refrigeration cycle that uses oil as the heat transfer medium. The cycle integrates a Rankine cycle for power production and an ejector refrigeration cycle for cold production. Thermodynamic analyses of the cycle were conducted to determine first law efficiency of 20% and second law efficiency of 11%. Key cycle components include a heliostat field, central receiver, heat recovery steam generator, turbine, evaporator, condenser and ejector. Effects of parameters such as steam temperature and evaporator temperature on cycle performance were examined.
This document summarizes a research paper that develops an acoustic source localization system using microphone arrays. It proposes using a minimum mean square error (MMSE) estimator to improve the signal-to-noise ratio of speech signals recorded by the microphones, which are distorted by ambient noise. The MMSE estimator models the speech and noise spectral components as random variables. Time delay estimation is then used on the enhanced speech signals to calculate the location of the speech source based on differences in time of arrival at each microphone. The system is evaluated theoretically and experimentally to assess its performance in locating sound sources.
The document discusses speech enhancement using a recursive filter. It begins by introducing speech processing and the need for enhancement. [1] It then provides an overview of the recursive filter, which estimates the state of a dynamic system perturbed by noise. [2] The document outlines the process, which involves expressing speech as a state space model and applying the recursive filter equations in a loop. [3] This involves predicting the state ahead and correcting it using measurements to iteratively estimate speech signals with less residual noise.
This document discusses imaging human speech as a 3D surface in order to better understand speech patterns and improve speech recognition technology. It presents methods for filtering waveforms to separate frequencies, generating waveform profiles, and transforming coordinates to display waveforms as a rotatable 3D surface. Examining speech over varying time scales and from different 3D perspectives reveals nuanced patterns that could help discriminate unique features of individual phonemes.
This document summarizes the steps to perform colored inversion (CI) on seismic data to obtain relative acoustic impedance values. CI involves: 1) Fitting a function to the log spectrum to model it, 2) Computing the difference between the modeled log spectrum and the seismic spectrum, 3) Converting the difference spectrum to an inversion operator, 4) Convolving the operator with the seismic data to obtain relative impedance values. As a quality control, the output impedance spectrum can be checked against the input log spectrum. The document provides code to implement this CI workflow using open-source Python libraries on a dataset from the Netherlands. CI produces informative relative impedance images to aid seismic interpretation.
This document provides an introduction to signal processing techniques for analytical chemistry. It discusses basic operations like addition, subtraction, multiplication and division of signals. It also covers smoothing, differentiation, resolution enhancement, harmonic analysis, convolution and other techniques. Key aspects of signals and noise are described, including distinguishing signal from noise, measuring signal-to-noise ratio, and improving signals through techniques like ensemble averaging. Examples are provided using the free SPECTRUM software and MATLAB.
We present a causal speech enhancement model working on the
raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with
skip-connections. It is optimized on both time and frequency
domains, using multiple loss functions. Empirical evidence
shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises,
as well as room reverb. Additionally, we suggest a set of
data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard
benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working
directly on the raw waveform.
Index Terms: Speech enhancement, speech denoising, neural
networks, raw waveform
This document discusses a study investigating the combined use of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) features in automatic speech recognition systems. It begins by outlining the challenges of automatic speech recognition and then describes the MFCC and LPC algorithms for extracting basic speech features. The study suggests combining MFCC and LPC-based recognition subsystems to improve reliability. Neural networks are used for training and recognition, and results show the combined approach improves recognition quality compared to individual methods.
The document discusses single carrier transmission using LabVIEW & NI-USRP. It covers several topics:
1. Symbol synchronization using the maximum output energy solution which introduces an adaptive element to find the optimal sampling time that maximizes output power.
2. The role of pseudo-noise sequences in frame synchronization, which provide properties like balance and unpredictability needed for random sequences.
3. The Moose algorithm for carrier frequency offset estimation and correction which exploits least squares to determine the phase shift between training sequences and correct sample phases.
4. The effects of multipath propagation including fading caused by constructive/destructive interference from multiple propagation paths, and intersymbol interference when path delays cause symbol interference.
The document discusses single carrier transmission using LabVIEW & NI-USRP. It covers several topics:
1. Symbol synchronization using the maximum output energy solution which introduces an adaptive element to find the optimal sampling time that maximizes output power.
2. The role of pseudo-noise sequences in frame synchronization, which provide properties like balance and unpredictability needed for random sequences.
3. The Moose algorithm for carrier frequency offset estimation and correction which exploits least squares to determine the phase shift between training sequences and correct sample phases.
4. The effects of multipath propagation including fading caused by constructive/destructive interference from multiple propagation paths, and intersymbol interference when path delays cause symbol interference.
Multiscale Entropy Analysis (MSE) is a method for measuring the complexity of time series data across multiple temporal scales. It involves coarse-graining the time series into multiple scales and calculating a sample entropy value at each scale to quantify the regularity. When applied to physiological signals, MSE reveals greater complexity in original data versus surrogate data, unlike single-scale entropy analyses. The software provided calculates MSE for physiological time series and outputs sample entropy values over a range of scales. Outliers can impact results by changing the time series variance, and filtering can alter MSE curves.
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Venkata Sudhir Vedurla
This document presents an analysis of acoustic echo cancellation for speech processing using the LMS adaptive filtering algorithm. It begins with an abstract that outlines the challenges of conventional echo cancellation techniques and the need for a computationally efficient, rapidly converging algorithm. It then provides background on acoustic echo, the principles of echo cancellation, discrete time signals, speech signals, and an overview of the LMS adaptive filtering algorithm and its application to echo cancellation. The document analyzes the performance of the LMS algorithm for echo cancellation by examining how the step size parameter affects convergence and steady state error. It concludes that the LMS algorithm is well-suited for echo cancellation due to its computational simplicity, though the step size must be carefully selected for optimal performance
Numerical Investigation of Multilayer Fractal FSSIJMER
Numerical investigations are presented for a multilayer frequency selective surface with Koch
fractal (levels 1 and 2) conducting patch elements. The structure investigated is obtained using two FSS
screens separated by an air gap layer. For the proposed investigation were used three different values an
air gap height. The results obtained using the numerical method were compared with other technique and
using the commercial software Ansoft DesignerTM. A good agreement was observed in terms of the
bandwidth.
Design of optimized Interval Arithmetic MultiplierVLSICS Design
Many DSP and Control applications that require the user to know how various numerical errors(uncertainty) affect the result. This uncertainty is eliminated by replacing non-interval values with intervals. Since most DSPs operate in real time environments, fast processors are required to implement interval arithmetic. The goal is to develop a platform in which Interval Arithmetic operations are performed at the same computational speed as present day signal processors. So we have proposed the design and implementation of Interval Arithmetic multiplier, which operates with IEEE 754 numbers. The proposed unit consists of a floating point CSD multiplier, Interval operation selector. This architecture implements an algorithm which is faster than conventional algorithm of Interval multiplier . The cost overhead of the proposed unit is 30% with respect to a conventional floating point multiplier. The
performance of proposed architecture is better than that of a conventional CSD floating-point multiplier, as it can perform both interval multiplication and floating-point multiplication as well as Interval comparisons
On The Fundamental Aspects of DemodulationCSCJournals
When the instantaneous amplitude, phase and frequency of a carrier wave are modulated with the information signal for transmission, it is known that the receiver works on the basis of the received signal and a knowledge of the carrier frequency. The question is: If the receiver does not have the a priori information about the carrier frequency, is it possible to carry out the demodulation process? This tutorial lecture answers this question by looking into the very fundamental process by which the modulated wave is generated. It critically looks into the energy separation algorithm for signal analysis and suggests modification for distortionless demodulation of an FM signal, and recovery of sub-carrier signals
The document discusses the controversy around purchasing a dedicated HDTV antenna. While they are marketed as being needed to receive high definition broadcasts, in reality all an antenna does is receive radio frequencies, including those used for HDTV broadcasts. A regular TV antenna can receive both standard definition and HDTV broadcasts as long as it covers the VHF and UHF bands. There is no technical need to purchase a specialized "HDTV antenna" to receive HD channels over the air. The document questions the value and necessity of paying more for an antenna marketed specifically for HDTV rather than a regular TV antenna.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
ER Publication,
IJETR, IJMCTR,
Journals,
International Journals,
High Impact Journals,
Monthly Journal,
Good quality Journals,
Research,
Research Papers,
Research Article,
Free Journals, Open access Journals,
erpublication.org,
Engineering Journal,
Science Journals,
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...IRJET Journal
This document discusses a modified orthogonal matching pursuit algorithm used for channel estimation in digital terrestrial television systems. It proposes using compressed sensing based channel estimation at the receiver to eliminate sparse information. Thresholding is used to remove noise from the channel estimation and improve signal quality. Simulation results show that bit error rate decreases when the received signal power from different transmitters is almost equal.
An Algorithm For Vector Quantizer DesignAngie Miller
The document presents an algorithm for designing vector quantizers. The algorithm is efficient, intuitive, and can be used for quantizers with general distortion measures and large block lengths. It is based on Lloyd's approach but does not require differentiation, making it applicable even when the data distribution has discrete components. The algorithm finds quantizers that meet necessary optimality conditions. Examples show it converges well and finds near-optimal quantizers for memoryless Gaussian sources. It is also used successfully to quantize LPC speech parameters with a complicated distortion measure.
Similar to Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition (20)
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 14, Issue 4 (Sep. - Oct. 2013), PP 61-67
www.iosrjournals.org
www.iosrjournals.org 61 | Page
Analytical Review of Feature Extraction Techniques for
Automatic Speech Recognition
Rajesh Makhijani1
, Ravindra Gupta2
1
(M. Tech Scholar, IT Dept, Sri Satya Sai Institute of Science & Technology, Sehore (M.P.), India)
2
(Assoc. Professor, IT Dept, Sri Satya Sai Institute of Science & Technology, Sehore (M.P.), India)
Abstract: Speech recognition is a multileveled pattern recognition task, in which acoustical signals are
examined and structured into a hierarchy of sub word units (e.g., phonemes), words, phrases, and sentences.
Each level may provide additional temporal constraints, e.g., known word pronunciations or legal word
sequences, which can compensate for errors or uncertainties at lower levels. This hierarchy of constraints can
best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions
only at the highest level.
Keywords: ASR (Automatic Speech Recognition)1; Dynamic Time Warping2; FET (Feature Extraction
Technique)3.
I. Introduction
In speech recognition, the main goal of the feature extraction step is to compute a parsimonious
sequence of feature vectors providing a compact representation of the given input signal. The feature extraction
process is usually performed in three stages, in which the first stage is called the speech analysis or the acoustic
front end. It performs some kind of spectro-temporal analysis of the signal and generates raw features
describing the envelope of the power spectrum of short speech intervals. The second stage compiles an
extended feature vector composed of static and dynamic features. Finally, the last stage (which is not always
present) transforms these extended feature vectors into more compact and robust vectors that are then supplied
to the recognizer. Although there is no real consensus as to what the optimal feature sets should look like, one
usually would like them to have the following properties: they should allow an automatic system to discriminate
between different through similar sounding speech sounds, they should allow for the automatic creation of
acoustic models for these sounds without the need for an excessive amount of training data, and they should
exhibit statistics which are largely invariant cross speakers and speaking environment.
Mel Spectral Coefficients
The human ear does not show a linear frequency resolution but builds several groups of frequencies
and integrates the spectral energies within a given group. Furthermore, the mid-frequency and bandwidth of
these groups are non–linearly distributed. The non–linear warping of the frequency axis can be modeled by the
so-called mel-scale. The frequency groups are assumed to be linearly distributed along the mel-scale. The so–
called mel–frequency 𝑓𝑚𝑒𝑙 can be computed from the frequency f as follows:
𝑓𝑚𝑒𝑙 𝑓 = 2595 ∙ 𝑙𝑜𝑔 1 +
𝑓
700 𝐻𝑧
(1)
The human ear has high frequency resolution in low–frequency parts of the spectrum and low frequency
resolution in the high–frequency parts of the spectrum. The coefficients of the power spectrum |𝑉 (𝑛) |2
are
now transformed to reflect the frequency resolution of the human ear.
Cepstral Transformation
Since the transmission function of the vocal tract 𝐻(𝑓) is multiplied with the spectrum of the excitation
signal 𝑋(𝑓), we had those un-wanted ―ripples‖ in the spectrum. For the speech recognition task, a smoothed
spectrum is required which should represent 𝐻(𝑓) but not 𝑋(𝑓). To cope with this problem, cepstral analysis is
used. We can separate the product of spectral functions into the interesting vocal tract spectrum and the part
describing the excitation and emission properties:
𝑆(𝑓) = 𝑋(𝑓) · 𝐻(𝑓) · 𝑅(𝑓) = 𝐻(𝑓) · 𝑈(𝑓) (2)
We can now transform the product of the spectral functions to a sum by taking the logarithm on both sides of the
equation:
log 𝑆 𝑓 = log(𝐻(𝑓) ∙ 𝑈(𝑓))
= log 𝐻 𝑓 + log(𝑈 𝑓 ) (3)
This holds also for the absolute values of the power spectrum and also for their squares:
log 𝑆 𝑓 2
= log( 𝐻(𝑓) 2
∙ 𝑈(𝑓) 2
)
2. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 62 | Page
= log( 𝐻(𝑓) 2
) + log( 𝑈(𝑓) 2
) (4)
In Fig. 1, we see an example of the log power spectrum, which contains unwanted ripples caused by the
excitation signal 𝑈(𝑓) = 𝑋(𝑓) · 𝑅(𝑓).
Fig. 1: Log power spectrum of the vowel /a: / (𝑓𝑠 = 11 kHz, 𝑁 = 512). The ripples in the spectrum are caused by X (f)
In the log–spectral domain we could now subtract the unwanted portion of the signal, if we knew |𝑈(𝑓)|2
exactly. But all we know is that 𝑈(𝑓) produces the ―ripples‖, which now are an additive component in the log–
spectral domain, and that if we would interpret this log–spectrum as a time signal, the ―ripples‖ would have a
―high frequency‖ compared to the spectral shape of |𝐻(𝑓)|. To get rid of the influence of 𝑈(𝑓), one would have
to get rid of the ―high-frequency‖ parts of the log–spectrum (remember, we are dealing with the spectral
coefficients as if they would represent a time signal). This would be a kind of low–pass filtering. The filtering
can be done by transforming the log–spectrum back into the time–domain (in the following, 𝐹𝑇−1
denotes the
inverse Fourier transform):
𝑠 𝑑 = 𝐹𝑇−1
log 𝑆(𝑓) 2
= 𝐹𝑇−1
log 𝐻(𝑓) 2
+ 𝐹𝑇−1
log 𝑈(𝑓) 2
(5)
The inverse Fourier transform brings us back to the time–domain (𝑑 is also called the delay or frequency),
giving the so–called cepstrum (a reversed ―spectrum‖). The resulting cepstrum is real–valued, since |𝑈(𝑓)|2
and
|𝐻(𝑓)|2
are both real-valued and both are even: |𝑈(𝑓)|2
= |𝑈(−𝑓)|2
and |𝐻(𝑓)|2
= |𝐻(−𝑓)|2
. Applying the
inverse DFT to the log power spectrum coefficients log |𝑉(𝑛)|2
yields:
Fig. 2: Cepstrum of the vowel /a: / (fs = 11 kHz, N = 512). The ripples in the spectrum result in a peak in the cepstrum
Mel Cepstrum
After being familiar with the cepstral transformation and cepstral smoothing, mel cepstrum is
computed, which is commonly used in speech recognition. As stated above, for speech recognition, the mel
spectrum is used to reflect the perception characteristics of the human ear. In analogy to computing the
cepstrum, we now take the logarithm of the mel power spectrum (instead of the power spectrum itself ) and
transform it into the frequency domain to compute the so–called mel cepstrum. Only the first 𝒬 (less than 14)
coefficients of the mel cepstrum are used in typical speech recognition systems. The restriction to the first 𝒬
coefficients reflects the low–pass liftering process as described above.
Since the mel power spectrum is symmetric due to ―Eq. (5)‖, the Fourier-Transform can be replaced by a simple
cosine transform:
𝑐 𝑞 = log 𝐺 𝑘
𝜅−1
𝑘=0
∙ cos
𝜋𝑞 2𝑘 + 1
2К
; 𝑞 = 0, 1, … , 𝒬 − 1 (6)
While successive coefficients 𝐺(𝑘) of the mel power spectrum are correlated, the Mel Frequency Cepstral
Coefficients (MFCC) resulting from the cosine transform ―Eq. (6)‖ are de-correlated. The MFCC are used
directly for further processing in the speech recognition system instead of transforming them back to the
frequency domain.
3. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 63 | Page
II. Feature And Vector Space
Until now, we have seen that the speech signal can be characterized by a set of parameters (features),
which will be measured in short intervals of time during a preprocessing step. Before we start to look at the
speech recognition task, we will first get familiar with the concept of feature vectors and vector space.
If we have a set of numbers representing certain features of an object we want to describe, it is useful for further
processing to construct a vector out of these numbers by assigning each measured value to one component of the
vector. As an example, think of an air conditioning system which will measure the temperature and relative
humidity in the office. If we measure those parameters every second or so and we put the temperature into the
first component and the humidity into the second component of a vector, we will get a series of two–
dimensional vectors describing how the air in the office changes in time. Since these so–called feature vectors
have two components, we can interpret the vectors as points in a two–dimensional vector space. Thus we can
draw a two–dimensional map of our measurements as sketched below. Each point in our map represents the
temperature and humidity in our office at a given time. As we know, there are certain values of temperature
and humidity which we find more comfortable than other values. In the map the comfortable value– pairs are
shown as points labeled ―+‖ and the less comfortable ones are shown as ―-‖. We can see that they form regions
of convenience and inconvenience, respectively.
Let’s assume we would want to know if a value–pair we measured in our office would be judged as
comfortable or uncomfortable by us. One way to find out is to initially run a test series trying out many value–
pairs and labeling each points either ―+‖ or ―-‖ in order to draw a map as the one shown below.
Fig. 3: A map of feature vectors
Now if we have measured a new value–pair and we are to judge if it will be convenient or not to a person, we
would have to judge if it lies within those regions which are marked in your map as ―+‖ or if it lies in those
marked as ―-‖.This is our first example of a classification task: We have two classes (―comfortable‖ and
―uncomfortable‖) and a vector in feature space which has to be assigned to one of these classes. — But how do
you describe the shape of the regions and how can you decide if a measured vector lies within or without a given
region.
Classification of Vectors
A. Prototype Vectors
The problem of how to represent the regions of ―comfortable‖ and ―uncomfortable‖ feature vectors of
our classification task can be solved by several approaches. One of the easiest is to select several of the
feature vectors we measured in our experiments for each of our classes (in our example we have only two
classes) and to declare the selected vectors as ―prototypes‖ representing their class. We will later discuss how
one can find a good selection of prototypes using the ―k–means algorithm‖. For now, we simply assume that we
were able to make a good choice of the prototypes, as shown in figure 4.
4. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 64 | Page
Fig. 4: Selected prototypes
B. Nearest Neighbor Classification
The classification of an unknown vector is now accomplished as follows: Measure the distance of the
unknown vector to all classes. Then assign the unknown vector to the class with the smallest distance. The
distance of the unknown vector to a given class is defined as the smallest distance between the unknown vector
and all of the prototypes representing the given class. One could also verbalize the classification task as: Find
the nearest prototype to the unknown vector and assign the unknown vector to the class this ―nearest neighbor‖
represents (Hence the name). Fig. 4 shows the unknown vector and the two ―nearest neighbors‖ of prototypes of
the two classes. The classification task we described can be formalized as follows: Let Ω = { 𝜔1, 𝜔2. . . 𝜔 𝑉 −1 }
be the set of classes, V being the total number of classes. Each class is represented by its prototype vectors
𝑝 𝑘,𝜔 𝑣
, where 𝑘 = 0, 1, . . . , (𝐾 𝜔 𝑣
− 1). Let 𝑥 denote the unclassified vector. Let the distance measure between
the vector and a prototype be denoted as 𝑑 (𝑥, 𝑝 𝑘,𝜔 𝑣
) (e.g. the Euclidean distance. We will discuss several
distance measures later). Then the class distance between 𝑥 and the class 𝜔𝑣 is defined as:
𝑑 𝜔 𝑣
𝑥 = min 𝑘 𝑑 𝑥, 𝑝 𝑘,𝜔 𝑣
; 𝑘 = 0, 1, … , 𝐾 𝜔 𝑣
− 1 (7)
III. Distance Measurement
So far, we have found a way to classify an unknown vector by calculation of its class–distances to
predefined classes, which in turn are defined by the distances to their individual prototype vectors. Now we will
briefly look at some commonly used distance measures. Depending on the application at hand, each of the
distance measures has its pros and cons, and we will discuss their most important properties.
Euclidian Distance
The Euclidean distance measure is the ―standard‖ distance measure between two vectors in feature
space (with dimension DIM) as we know it from school:
𝑑 𝐸𝑢𝑐𝑙𝑖𝑑
2
𝑥, 𝑝 = 𝑥𝑖 − 𝑝𝑖
2
(8)
𝐷𝐼𝑀−1
𝑖=0
To calculate the Euclidean distance measure, we have to compute the sum of the squares of the differences
between the individual components of 𝑥 and 𝑝. This can also be written as the following scalar product:
𝑑 𝐸𝑢𝑐𝑙𝑖𝑑
2
𝑥, 𝑝 = 𝑥 − 𝑝 ′
∙ 𝑥 − 𝑝 (9)
Where ′ denotes the vector transpose. Note that both equations ―Eq. (8)‖ and ―Eq. (9)‖ compute the square of the
Euclidean distance, 𝑑2
instead of 𝑑. The Euclidean distance is probably the most commonly used distance
measure in pattern recognition.
City Block Distance
The computation of the Euclidean distance involves computing the squares of the individual
differences thus involving many multiplications. To reduce the computational complexity, one can also use the
absolute values of the differences instead of their squares. This is similar to measuring the distance between two
points on a street map: We go three blocks to the East, then two blocks to the South (instead of straight trough
the buildings as the Euclidean distance would assume). Then we sum up all the absolute values for all the
dimensions of the vector space.
5. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 65 | Page
Weighted Euclidean Distance
Both the Euclidean distance and the City Block distance are treating the individual dimensions of the
feature space equally, i.e., the distances in each dimension contributes in the same way to the overall distance.
But if we remember our example from section 2.1, we see that for real–world applications, the individual
dimensions will have different scales also. While in our office the temperature values will have a range of
typically between 18 and 22 degrees Celsius, the humidity will have a range from 40 to 60 percent relative
humidity. While a small difference in humidity of e.g., 4 percent relative humidity might not even be noticed by
a person, a temperature difference of 4 degrees Celsius certainly will. In Fig. 5, we see a more abstract example
involving two classes and two dimensions. The dimension 𝑥1 has a wider range of values than dimension 𝑥2, so
all the measured values (or prototypes) are spread wider along the axis denoted as “𝑥1” as compared to axis ”𝑥2”.
Obviously, a Euclidean or City Block distance measure would give the wrong result, classifying the unknown
vector as ―class A‖ instead of ―class B‖ which would (probably) be the correct result.
Fig. 5: Two dimensions with different scales
To cope with this problem, the different scales of the dimensions of our feature vectors have to be compensated
when computing the distance. This can be done by multiplying each contributing term with a scaling factor
specific for the respective dimension. This leads us to the so–called ―𝑊𝑒𝑖𝑔𝑡𝑒𝑑 𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒”
Mahalanobis Distance
So far, we can deal with different scales of our features using the weighted Euclidean distance measure.
This works very well if there is no correlation between the individual features as it would be the case if the
features we selected for our vector space were statistically independent from each other. What if they are not?
Fig. 6 shows an example in which the features 𝑥1 and 𝑥2 are correlated.
Obviously, for both classes 𝐴 and 𝐵, a high value of 𝑥1 correlates with a high value for 𝑥2 (with respect
to the mean vector (center) of the class), which is indicated by the orientation of the two ellipses. In this case,
we would want the distance measure to regard both the correlation and scale properties of the features. Here a
simple scale transformation will not be sufficient. Instead, the correlations between the individual components
of the feature vector will have to be regarded when computing the distance between two vectors. This leads us
to a new distance measure, the so–called Mahalanobis Distance.
Fig. 6: Correlated Features
IV. Dynamic Time Warping
In the last section, we were dealing with the task of classifying single vectors to a given set of classes
which were represented by prototype vectors computed from a set of training vectors. Several distance measures
were presented, some of them using additional sets of parameters (e.g., the covariance matrices) which also had
to be computed from training vectors.
How does this relate to speech recognition?
6. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 66 | Page
As we know that our speech signal is represented by a series of feature vectors which are computed
every 10 ms. A whole word will comprise dozens of those vectors, and we know that the number of vectors (the
duration) of a word will depend on how fast a person is speaking. Therefore, our classification task is different
from what we have learned before. In speech recognition, we have to classify not only single vectors, but
sequences of vectors. Let’s assume we would want to recognize a few command words or digits. For an
utterance of a word 𝑤 which is 𝑇𝑋 vectors long, we will get a sequence of vectors 𝑋 = {𝑥0, 𝑥1, . . . , 𝑥 𝑋−1} from
the acoustic preprocessing stage. What we need here is a way to compute a ―distance‖ between this unknown
sequence of vectors 𝑋 and known sequences of vectors 𝑊𝑘 = 𝑤 𝑘0, 𝑤𝑘1, … , 𝑤 𝑘𝑇 𝑊 𝑘
which are prototypes for
the words we want to recognize. Let our vocabulary (here: the set of classes Ω) contain 𝑉 different words
𝑤0, 𝑤1, . . . 𝑤𝑉−1. In analogy to the Nearest Neighbor classification task from section 2.1, we will allow a word
𝑤𝑣 (here: class 𝑤𝑣 ∈ 𝛺) to be represented by a set of prototypes 𝑊𝑘,𝜔 𝑣
, 𝑘 = 0, 1, … , 𝐾 𝜔 𝑣
− 1 to reflect all the
variations possible due to different pronunciation or even different speakers.
Fig. 3.16: Possible assignment between the vector pairs of 𝑋 and 𝑊
The Dynamic Programming Algorithm
In the following formal framework we will iterate through the matrix column by column, starting with
the leftmost column and beginning each column at the bottom and continuing to the top. For ease of notation,
we define d(i, j) to be the distance d( wi, xj) between the two vectors wi and xj.
Let 𝛿𝑗 (𝑖) be the accumulated distance 𝛿(𝑖, 𝑗) at grid point (𝑖, 𝑗) and 𝛿𝑗−1(𝑖) the accumulated distance
𝛿(𝑖, 𝑗 − 1) at grid point (𝑖, 𝑗 − 1). It should be mentioned that it possible to use a single array for time indices 𝑗
and 𝑗 − 1. One can overwrite the old values of the array with the new ones. However, for clarity, the algorithm
using two arrays is described here and the formulation for a single–array algorithm is left to the reader.
To keep track of all the selections among the path hypotheses during the optimization, we have to store each
path alternative chosen for every grid point. We could for every grid point (𝑖, 𝑗) either store the indices 𝑘 and 𝑙
of the predecessor point (𝑘, 𝑙) or we could only store a code number for one of the three path alternatives
(horizontal, diagonal and vertical path) and compute the predecessor point (𝑘, 𝑙) out of the code and the current
point (𝑖, 𝑗). While the description of the DTW classification algorithm might let us think that one would
compute all the distances sequentially and then select the minimum distance, it is more useful in practical
applications to compute all the distances between the unknown vector sequence and the class prototypes in
parallel. This is possible since the DTW algorithm needs only the values for time index 𝑡 and (𝑡 − 1) and
therefore there is no need to wait until the utterance of the unknown vector sequence is completed. Instead, one
can start with the recognition process immediately as soon as the utterance begins (we will not deal with the
question of how to recognize the start and end of an utterance here).
To do so, we have to reorganize our search space a little bit. First, let’s assume the total number of all
prototypes over all classes is given by 𝑀. If we want to compute the distances to all 𝑀 prototypes
simultaneously, we have to keep track of the accumulated distances between the unknown vector sequence and
the prototype sequences individually. Hence, instead of the column (or two columns, depending on the
implementation) we used to hold the accumulated distance values for all grid points; we now have to provide 𝑀
columns during the DTW procedure. Now we introduce an additional ―virtual‖ grid point together with a
specialized local path alternative for this point: The possible predecessors for this point are defined to be the
upper–right grid points of the individual grid matrices of the prototypes. In other words, the virtual grid point
can only be reached from the end of each prototype word, and among all the possible prototype words, the one
with the smallest accumulated distance is chosen. By introducing this virtual grid point, the classification task
itself (selecting the class with the smallest class distance) is integrated into the framework of finding the optimal
path.
Now all we have to do is to run the DTW algorithm for each time index j and along all columns of all
prototype sequences. At the last time slot (𝑇 𝑊 − 1) we perform the optimization step for the virtual grid point,
i.e; the predecessor grid point to the virtual grid point is chosen to be the prototype word having the smallest
accumulated distance. Note that the search space we have to consider is spanned by the length of the unknown
7. Analytical Review of Feature Extraction Techniques for Automatic Speech Recognition
www.iosrjournals.org 67 | Page
vector sequence on one hand and the sum of the length of all prototype sequences of all classes on the other
hand. The backtracking procedure can of course be restricted to keeping track of the final optimization step
when the best predecessor for the virtual grid point is chosen. The classification task is then performed by
assigning the unknown vector sequence to the very class to which the prototype belongs to whose word end grid
point was chosen.
Of course, this is just a different (and quite complicated) definition of how we can perform the DTW
classification task. Therefore, only a verbal description was given and we did not bother with a formal
description. However, by the reformulation of the DTW classification we learned a few things:
The DTW algorithm can be used for real–time computation of the distances.
The classification task has been integrated into the search for the optimal path.
Instead of the accumulated distance, now the optimal path itself is important for the classification task.
V. Conclusion
Speech is the primary, and the most convenient means of communication between people. Whether due
to technological curiosity to build machines that mimic humans or desire to automate work with machines,
research in speech and speaker recognition, as a first step toward natural human-machine communication, has
attracted much enthusiasm over the past five decades. We have also encountered a number of practical
limitations which hinder a widespread deployment of application and services. In most speech recognition tasks,
human subjects produce one to two orders of magnitude less errors than machines. There is now increasing
interest in finding ways to bridge such a performance gap. What we know about human speech processing is
very limited. Although these areas of investigations are important the significant advances will come from
studies in acoustic-phonetics, speech perception, linguistics, and psychoacoustics. Future systems need to have
an efficient way of representing, storing, and retrieving knowledge required for natural conversation. This paper
attempts to provide a comprehensive survey of research on speech recognition and to provide some yearwise
progress to this date. Although significant progress has been made in the last two decades, there is still work to
be done, and we believe that a robust speech recognition system should be effective under full variation in:
environmental conditions, speaker variability s etc. Speech Recognition is a challenging and interesting problem
in and of itself. We have attempted in this paper to provide a comprehensive cursory, look and review of how
much speech recognition technology progressed in the last 60 years. Speech recognition is one of the most
integrating areas of machine intelligence, since; humans do a daily activity of speech recognition. Speech
recognition has attracted scientists as an important discipline and has created a technological impact on society
and is expected to flourish further in this area of human machine interaction. We hope this paper brings about
understanding and inspiration amongst the research communities of ASR.
References
[1] Sadaoki Furui, 50 years of Progress in speech and Speaker Recognition Research, ECTI Transactions on Computer and
Information Technology,Vol.1. No.2, November 2005.
[2] K. H. Davis, R. Biddulph, and S. Balashek, Automatic recognition of spoken Digits, J. Acoust. Soc. Am., 24(6): 637-642, 1952.
[3] H. F. Olson and H. Belar, Phonetic Typewriter, J. Acoust. Soc. Am., 28(6): 1072-1081, 1956.
[4] D. B. Fry, Theoritical Aspects of Mechanical speech Recognition, and P. Denes, The design and Operation of the Mechanical
Speech Recognizer at Universtiy College London, J. British Inst. Radio Engr., 19: 4, 211 - 299, 1959.
[5] J.W. Forgie and C. D. Forgie, Results obtained from a vowel recognition computer program, J.A.S.A., 31(11), pp.1480-1489.
1959.
[6] J. Suzuki and K. Nakata, Recognition of Japanese Vowels Preliminary to the Recognition of Speech, J. Radio Res. Lab 37(8):193-
212, 1961.
[7] T. Sakai and S. Doshita, The phonetic typewriter, Information processing 1962, Proc .IFIP Congress, 1962.
[8] K. Nagata, Y. Kato, and S. Chiba, Spoken Digit Recognizer for Japanese Language, NEC Res. Develop, No. 6, 1963.
[9] T. B. Martin, A. L. Nelson, and H. J. Zadell, Speech Recognition & Feature Abstraction Techniques, Tech. Report AL-TDR-64-
176, Air Force Avionics Lab, 1964.
[10] T. K. Vintsyuk, Speech Discrimination by Dynamic Programming, Kibernetika, 4(2):81-88, Jan.-Feb.1968.
[11] H. Sakoe and S. Chiba, Dynamic programming: algorithm optimization for spoken word recognition, IEEE Tran. Acoustics,
Speech, Signal Proc., ASSP-26(1). pp. 43- 49, 1978.
[12] D. R. Reddy, An Approach to Computer Speech Recognition by Direct Analysis of the Speech Wave, Tech. Report No. C549,
Computer Science Dept., Stanford Univ., September 1966.
[13] V. M. Velichko and N. G. Zagoruyko, Automatic Recognition of 200 words , Int. J. Man-Machine Studies, 2:223, June 1970.
[14] H. Sakoe and S. Chiba, Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Trans. Acoustics,
Speech, Signal Proc., ASSP-26(1):43-49, February 1978.
[15] F. Itakura, Minimum Prediction Residual Applied to Speech Recognition, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-
23(1): 67-72, February 1975.
[16] C.C. Tappert, N. R. Dixon, A.S. Rabinowitz, and W. D. Chapman, Automatic Recognition of Continuous Speech Utilizing
Dynamic Segmentation, Dual Classification, Sequential Decoding and Error Recover, Rome Air Dev. Cen, Rome, NY, Tech. Report
TR-71-146,1971.