There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity
of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy
and pathological voices from German database of voice disorders. The
performance of the proposed algorithm is evaluated in a term of the accuracy
(97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is
90% for normal class and 95% for pathological class.
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal background model Gaussian mixture model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
Eavesdropping is the act of secretly listening to the private conversation of others without their consent .In this paper explains how we can identify an active eavesdropper in the communication channel and how to drop it. We derive a new security attack from the pilot contamination phenomenon using PSO. Among the variety of threats and risks that wireless LANs are facing, session hijacking attacks are common and serious ones. Current techniques for detecting session hijacking attacks are mainly based on spoofable and predictable parameters such as sequence numbers, which can be guessed by the attackers. To enhance the reliability of intrusion detection systems, mechanisms that utilize the unspoofable PHY layer characteristics are needed. We derive a new security attack from the pilot contamination phenomenon, which targets at systems using reverse training to obtain the CSI at the transmitter for precoder design. A PSO based optimal filter is then designed for the purpose of detection. It’s the fastest and high accuracy optimization technique. We show that using a Wavelet Transform (WT), the colored noise with complex Power Spectral Density (PSD) in our case can be approximately whitened. Since a larger Signal to Noise Ratio (SNR) increases the detection rate and decreases the false alarm rate, the SNR is maximized by analyzing the signal at specific frequency ranges. The detection mechanism is validated using both simulation results.
A novel convolutional neural network based dysphonic voice detection algorit...IJECEIAES
This paper presents a convolutional neural network (CNN) based noninvasive pathological voice detection algorithm using signal processing approach. The proposed algorithm extracts an acoustic feature, called chromagram, from voice samples and applies this feature to the input of a CNN for classification. The main advantage of chromagram is that it can mimic the way humans perceive pitch in sounds and hence can be considered useful to detect dysphonic voices, as the pitch in the generated sounds varies depending on the pathological conditions. The simulation results show that classification accuracy of 85% can be achieved with the chromagram. A comparison of the performances for the proposed algorithm with those of other related works is also presented.
Voice Assessments for Detecting Patients with Parkinson’s Diseases in Differe...IJECEIAES
Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to detect patients with Parkinson’s disease (PD). So we have computed 19 dysphonia measures from sustained vowels collected from 375 voice samples from healthy and people suffer from PD. All the features are analysed and the more relevant ones are selected by the Principal component analysis (PCA) to classify the subjects in 4 classes according to the UPDRS (unified Parkinson’s disease Rating Scale) score. We used kfolds cross validation method with (k=4) validation scheme; 75% for training and 25% for testing, along with the Support Vector Machines (SVM) with its different types of kernels. The best result obtained was 92.5% using the PCA and the linear SVM.
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is
presented. A typical gender recognition system can be divided into front-end system and back-end system.
The task of the front-end system is to extract the gender related information from a speech signal and
represents it by a set of vectors called feature. Features like power spectrum density, frequency at
maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT)
algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize
the gender from his/her speech signal in recognition phase. This paper also presents the digital processing
of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and
the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at
maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses
threshold technique as identification tool. The recognition accuracy of this system is 80% on average.
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal background model Gaussian mixture model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
Eavesdropping is the act of secretly listening to the private conversation of others without their consent .In this paper explains how we can identify an active eavesdropper in the communication channel and how to drop it. We derive a new security attack from the pilot contamination phenomenon using PSO. Among the variety of threats and risks that wireless LANs are facing, session hijacking attacks are common and serious ones. Current techniques for detecting session hijacking attacks are mainly based on spoofable and predictable parameters such as sequence numbers, which can be guessed by the attackers. To enhance the reliability of intrusion detection systems, mechanisms that utilize the unspoofable PHY layer characteristics are needed. We derive a new security attack from the pilot contamination phenomenon, which targets at systems using reverse training to obtain the CSI at the transmitter for precoder design. A PSO based optimal filter is then designed for the purpose of detection. It’s the fastest and high accuracy optimization technique. We show that using a Wavelet Transform (WT), the colored noise with complex Power Spectral Density (PSD) in our case can be approximately whitened. Since a larger Signal to Noise Ratio (SNR) increases the detection rate and decreases the false alarm rate, the SNR is maximized by analyzing the signal at specific frequency ranges. The detection mechanism is validated using both simulation results.
A novel convolutional neural network based dysphonic voice detection algorit...IJECEIAES
This paper presents a convolutional neural network (CNN) based noninvasive pathological voice detection algorithm using signal processing approach. The proposed algorithm extracts an acoustic feature, called chromagram, from voice samples and applies this feature to the input of a CNN for classification. The main advantage of chromagram is that it can mimic the way humans perceive pitch in sounds and hence can be considered useful to detect dysphonic voices, as the pitch in the generated sounds varies depending on the pathological conditions. The simulation results show that classification accuracy of 85% can be achieved with the chromagram. A comparison of the performances for the proposed algorithm with those of other related works is also presented.
Voice Assessments for Detecting Patients with Parkinson’s Diseases in Differe...IJECEIAES
Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to detect patients with Parkinson’s disease (PD). So we have computed 19 dysphonia measures from sustained vowels collected from 375 voice samples from healthy and people suffer from PD. All the features are analysed and the more relevant ones are selected by the Principal component analysis (PCA) to classify the subjects in 4 classes according to the UPDRS (unified Parkinson’s disease Rating Scale) score. We used kfolds cross validation method with (k=4) validation scheme; 75% for training and 25% for testing, along with the Support Vector Machines (SVM) with its different types of kernels. The best result obtained was 92.5% using the PCA and the linear SVM.
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is
presented. A typical gender recognition system can be divided into front-end system and back-end system.
The task of the front-end system is to extract the gender related information from a speech signal and
represents it by a set of vectors called feature. Features like power spectrum density, frequency at
maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT)
algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize
the gender from his/her speech signal in recognition phase. This paper also presents the digital processing
of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and
the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at
maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses
threshold technique as identification tool. The recognition accuracy of this system is 80% on average.
Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script.
ELM and K-nn machine learning in classification of Breath sounds signals IJECEIAES
The acquisition of Breath sounds (BS) signals from a human respiratory system with an electronic stethoscope, provide and offer prominent information which helps the doctors to diagnosis and classification of pulmonary diseases. Unfortunately, this BS signals with other biological signals have a non-stationary nature according to the variation of the lung volume, and this nature makes it difficult to analyze and classify between several diseases. In this study, we were focused on comparing the ability of the extreme learning machine (ELM) and k-nearest neighbour (K-nn) machine learning algorithms in the classification of adventitious and normal breath sounds. To do so, the empirical mode decomposition (EMD) was used in this work to analyze BS, this method is rarely used in the breath sounds analysis. After the EMD decomposition of the signals into Intrinsic Mode Functions (IMFs), the Hjorth descriptors (Activity) and Permutation Entropy (PE) features were extracted from each IMFs and combined for classification stage. The study has found that the combination of features (activity and PE) yielded an accuracy of 90.71%, 95% using ELM and K-nn respectively in binary classification (normal and abnormal breath sounds), and 83.57%, 86.42% in multiclass classification (five classes).
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
Analysis of speech for recognition of stress is important for identification of emotional state of
person. This can be done using ‘Linear Techniques’, which has different parameters like pitch,
vocal tract spectrum, formant frequencies, Duration, MFCC etc. which are used for extraction
of features from speech. TEO-CB-Auto-Env is the method which is non-linear method of
features extraction. Analysis is done using TU-Berlin (Technical University of Berlin) German
database. Here emotion recognition is done for different emotions like neutral, happy, disgust,
sad, boredom and anger. Emotion recognition is used in lie detector, database access systems, and in military for recognition of soldiers’ emotion identification during the war.
Recognition of emotional states using EEG signals based on time-frequency ana...IJECEIAES
The recognition of emotions is a vast significance and a high developing field of research in the recent years. The applications of emotion recognition have left an exceptional mark in various fields including education and research. Traditional approaches used facial expressions or voice intonation to detect emotions, however, facial gestures and spoken language can lead to biased and ambiguous results. This is why, researchers have started to use electroencephalogram (EEG) technique which is well defined method for emotion recognition. Some approaches used standard and pre-defined methods of the signal processing area and some worked with either fewer channels or fewer subjects to record EEG signals for their research. This paper proposed an emotion detection method based on time-frequency domain statistical features. Box-and-whisker plot is used to select the optimal features, which are later feed to SVM classifier for training and testing the DEAP dataset, where 32 participants with different gender and age groups are considered. The experimental results show that the proposed method exhibits 92.36% accuracy for our tested dataset. In addition, the proposed method outperforms than the state-of-art methods by exhibiting higher accuracy.
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
This work presents an application of Fundamental Frequency (Pitch), Linear Predictive Cepstral Coefficient
(LPCC) and Mel Frequency Cepstral Coefficient (MFCC) in identification of sex of the speaker in speech
recognition research. The aim of this article is to compare the performance of these three methods for
identification of sex of the speakers. A successful speech recognition system can help in non critical operations
such as presenting the driving route to the driver, dialing a phone number, light switch turn on/off, the coffee
machine on/off etc. apart from speaker verification-caste wise, community wise and locality wise including
identification of sex. Here an attempt has been made to identify the sex of Bodo speakers through vowel
utterance by following Pitch value, LPCC and MFCC techniques. It is found here that the feature vector
organization of LPCC coefficients provides a more promising way of speech-speaker recognition in case of
Bodo Language than that of Pitch and MFCC.
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
Ultrasound imaging is one of the most widely used non-destructive testing methods. The transducer emits pulses that travel through the imaged samples and are reflected by echo-forming impedance. The resulting ultrasonic signals usually contain noise. Most of the traditional noise reduction algorithms require high skills and prior knowledge of noise distribution, which has a crucial impact on their performances. As a result, these methods generally yield a loss of information, significantly influencing the final data and deeply limiting both sensitivity and resolution of imaging devices in medical and industrial applications. In the present study, a denoising method based on an attention-gated convolutional autoencoder is proposed to fill this gap. To evaluate its performance, the suggested protocol is compared to widely used methods such as butterworth filtering (BF), discrete wavelet transforms (DWT), principal component analysis (PCA), and convolutional autoencoder (CAE) methods. Results proved that better denoising can be achieved especially when the original signal-to-noise ratio (SNR) is very low and the sound waves’ traces are distorted by noise. Moreover, the initial SNR was improved by up to 30 dB and the resulting Pearson correlation coefficient was maintained over 99% even for ultrasonic signals with poor initial SNR.
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
Modern automatic speech recognition (ASR) systems typically use a bank of linear filters as the first step in performing frequency analysis of speech. On the other hand, the cochlea, which is responsible for frequency analysis in the human auditory system, is known to have a compressive non-linear frequency response which depends on input stimulus level. It will be shown in this paper that it presents a new method on the use of the gammachirp auditory filter based on a continuous wavelet analysis. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short term Fourier transformation (STFT). The prosodic features such as pitch, formant frequency, jitter and shimmer are extracted from the fundamental frequency contour and added to baseline spectral features, specifically, Mel Frequency Cepstral Coefficients (MFCC) for human speech, Gammachirp Filterbank Cepstral Coefficient (GFCC) and Gammachirp Wavelet Frequency Cepstral Coefficient (GWFCC). The results show that the gammachirp wavelet gives results that are comparable to ones obtained by MFCC and GFCC. Experimental results show the best performance of this architecture. This paper implements the GW and examines its application to a specific example of speech. Implications for noise robust speech analysis are also discussed within AURORA databases.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
Improving the Proactive Routing Protocol using Depth First Iterative Deepenin...Yayah Zakaria
Owing to the wireless and mobility nature, nodes in a mobile ad hoc network are not within the transmission range. It needs to transfer data through the multi-intermediate nodes. Opportunistic data forwarding is an assuring solution to make use of the broadcast environment of wireless communication links. Due to absence of source routing capability with efficient proactive routing protocol, it is not widely used. To rectify the
problem, we proposed memory and routing efficient proactive routing protocol using Depth-First Iterative-Deepening and hello messaging scheme. This protocol can conserve the topology information in every node in the network. In experimental analysis and discussion, we implemented the proposed work using NS2 simulator tool and proved that the proposed technique is performed well in terms of average delay, buffer and throughput.
Improvement at Network Planning using Heuristic Algorithm to Minimize Cost of...Yayah Zakaria
Wireless Mesh Networks (WMN) consists of wireless stations that are connected with each other in a semi-static configuration. Depending on the configuration of a WMN, different paths between nodes offer different levels of efficiency. One areas of research with regard to WMN is cost minimization. A Modified Binary Particle Swarm Optimization (MBPSO) approach was used to optimize cost. However, minimized cost does not
guarantee network performance. This paper thus, modified the minimization function to take into consideration the distance between the different nodes so as to enable better performance while maintaining cost balance. The results were positive with the PDR showing an approximate increase of 17.83% whereas the E2E delay saw an approximate decrease of 8.33%.
More Related Content
Similar to Improved Algorithm for Pathological and Normal Voices Identification
Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script.
ELM and K-nn machine learning in classification of Breath sounds signals IJECEIAES
The acquisition of Breath sounds (BS) signals from a human respiratory system with an electronic stethoscope, provide and offer prominent information which helps the doctors to diagnosis and classification of pulmonary diseases. Unfortunately, this BS signals with other biological signals have a non-stationary nature according to the variation of the lung volume, and this nature makes it difficult to analyze and classify between several diseases. In this study, we were focused on comparing the ability of the extreme learning machine (ELM) and k-nearest neighbour (K-nn) machine learning algorithms in the classification of adventitious and normal breath sounds. To do so, the empirical mode decomposition (EMD) was used in this work to analyze BS, this method is rarely used in the breath sounds analysis. After the EMD decomposition of the signals into Intrinsic Mode Functions (IMFs), the Hjorth descriptors (Activity) and Permutation Entropy (PE) features were extracted from each IMFs and combined for classification stage. The study has found that the combination of features (activity and PE) yielded an accuracy of 90.71%, 95% using ELM and K-nn respectively in binary classification (normal and abnormal breath sounds), and 83.57%, 86.42% in multiclass classification (five classes).
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
Analysis of speech for recognition of stress is important for identification of emotional state of
person. This can be done using ‘Linear Techniques’, which has different parameters like pitch,
vocal tract spectrum, formant frequencies, Duration, MFCC etc. which are used for extraction
of features from speech. TEO-CB-Auto-Env is the method which is non-linear method of
features extraction. Analysis is done using TU-Berlin (Technical University of Berlin) German
database. Here emotion recognition is done for different emotions like neutral, happy, disgust,
sad, boredom and anger. Emotion recognition is used in lie detector, database access systems, and in military for recognition of soldiers’ emotion identification during the war.
Recognition of emotional states using EEG signals based on time-frequency ana...IJECEIAES
The recognition of emotions is a vast significance and a high developing field of research in the recent years. The applications of emotion recognition have left an exceptional mark in various fields including education and research. Traditional approaches used facial expressions or voice intonation to detect emotions, however, facial gestures and spoken language can lead to biased and ambiguous results. This is why, researchers have started to use electroencephalogram (EEG) technique which is well defined method for emotion recognition. Some approaches used standard and pre-defined methods of the signal processing area and some worked with either fewer channels or fewer subjects to record EEG signals for their research. This paper proposed an emotion detection method based on time-frequency domain statistical features. Box-and-whisker plot is used to select the optimal features, which are later feed to SVM classifier for training and testing the DEAP dataset, where 32 participants with different gender and age groups are considered. The experimental results show that the proposed method exhibits 92.36% accuracy for our tested dataset. In addition, the proposed method outperforms than the state-of-art methods by exhibiting higher accuracy.
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
This work presents an application of Fundamental Frequency (Pitch), Linear Predictive Cepstral Coefficient
(LPCC) and Mel Frequency Cepstral Coefficient (MFCC) in identification of sex of the speaker in speech
recognition research. The aim of this article is to compare the performance of these three methods for
identification of sex of the speakers. A successful speech recognition system can help in non critical operations
such as presenting the driving route to the driver, dialing a phone number, light switch turn on/off, the coffee
machine on/off etc. apart from speaker verification-caste wise, community wise and locality wise including
identification of sex. Here an attempt has been made to identify the sex of Bodo speakers through vowel
utterance by following Pitch value, LPCC and MFCC techniques. It is found here that the feature vector
organization of LPCC coefficients provides a more promising way of speech-speaker recognition in case of
Bodo Language than that of Pitch and MFCC.
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
Ultrasound imaging is one of the most widely used non-destructive testing methods. The transducer emits pulses that travel through the imaged samples and are reflected by echo-forming impedance. The resulting ultrasonic signals usually contain noise. Most of the traditional noise reduction algorithms require high skills and prior knowledge of noise distribution, which has a crucial impact on their performances. As a result, these methods generally yield a loss of information, significantly influencing the final data and deeply limiting both sensitivity and resolution of imaging devices in medical and industrial applications. In the present study, a denoising method based on an attention-gated convolutional autoencoder is proposed to fill this gap. To evaluate its performance, the suggested protocol is compared to widely used methods such as butterworth filtering (BF), discrete wavelet transforms (DWT), principal component analysis (PCA), and convolutional autoencoder (CAE) methods. Results proved that better denoising can be achieved especially when the original signal-to-noise ratio (SNR) is very low and the sound waves’ traces are distorted by noise. Moreover, the initial SNR was improved by up to 30 dB and the resulting Pearson correlation coefficient was maintained over 99% even for ultrasonic signals with poor initial SNR.
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
Modern automatic speech recognition (ASR) systems typically use a bank of linear filters as the first step in performing frequency analysis of speech. On the other hand, the cochlea, which is responsible for frequency analysis in the human auditory system, is known to have a compressive non-linear frequency response which depends on input stimulus level. It will be shown in this paper that it presents a new method on the use of the gammachirp auditory filter based on a continuous wavelet analysis. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short term Fourier transformation (STFT). The prosodic features such as pitch, formant frequency, jitter and shimmer are extracted from the fundamental frequency contour and added to baseline spectral features, specifically, Mel Frequency Cepstral Coefficients (MFCC) for human speech, Gammachirp Filterbank Cepstral Coefficient (GFCC) and Gammachirp Wavelet Frequency Cepstral Coefficient (GWFCC). The results show that the gammachirp wavelet gives results that are comparable to ones obtained by MFCC and GFCC. Experimental results show the best performance of this architecture. This paper implements the GW and examines its application to a specific example of speech. Implications for noise robust speech analysis are also discussed within AURORA databases.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
Improving the Proactive Routing Protocol using Depth First Iterative Deepenin...Yayah Zakaria
Owing to the wireless and mobility nature, nodes in a mobile ad hoc network are not within the transmission range. It needs to transfer data through the multi-intermediate nodes. Opportunistic data forwarding is an assuring solution to make use of the broadcast environment of wireless communication links. Due to absence of source routing capability with efficient proactive routing protocol, it is not widely used. To rectify the
problem, we proposed memory and routing efficient proactive routing protocol using Depth-First Iterative-Deepening and hello messaging scheme. This protocol can conserve the topology information in every node in the network. In experimental analysis and discussion, we implemented the proposed work using NS2 simulator tool and proved that the proposed technique is performed well in terms of average delay, buffer and throughput.
Improvement at Network Planning using Heuristic Algorithm to Minimize Cost of...Yayah Zakaria
Wireless Mesh Networks (WMN) consists of wireless stations that are connected with each other in a semi-static configuration. Depending on the configuration of a WMN, different paths between nodes offer different levels of efficiency. One areas of research with regard to WMN is cost minimization. A Modified Binary Particle Swarm Optimization (MBPSO) approach was used to optimize cost. However, minimized cost does not
guarantee network performance. This paper thus, modified the minimization function to take into consideration the distance between the different nodes so as to enable better performance while maintaining cost balance. The results were positive with the PDR showing an approximate increase of 17.83% whereas the E2E delay saw an approximate decrease of 8.33%.
A Vertical Handover Algorithm in Integrated Macrocell Femtocell Networks Yayah Zakaria
The explosion in wireless telecommunication technologies has lead to a huge increase in the number of mobile users. The greater dependency on the mobile devices has raised the user’s expectations to always remain best connected. In the process, the user is always desiringgood signal strength even at certain black spots and indoors. Moreover, the exponential growth of
the number of mobile devices has overloaded macrocells. Femtocells have emerged out as a good promising solution for complete coverage indoors and for offloading macrocell. Therefore, a new handover strategy between femtocells and macrocell is proposed in this paper. The proposed handover
algorithm is mainly based on calculating equivalent received signal strength along with dynamic margin for performing handover. The simulation results of proposed algorithm are compared with the traditional algorithm. The proposed strategy shows improvement in two major performance parameters
namely reduction in unnecessary handovers and Packet Loss Ratio. The quantitative analysis further shows 55.27% and 23.03% reduction in packet loss ratio and 61.85% and 36.78% reduction in unnecessary handovers at a speed of 120kmph and 30kmph respectively. Moreover, the proposed algorithm proves to be an efficient solution for both slow and fast moving vehicles.
Symmetric Key based Encryption and Decryption using Lissajous Curve EquationsYayah Zakaria
Sender and receiver both uses two large similar prime numbers and uses parametric equations for swapping values of kx and by product of kx and ky is the common secret key. Generated secret key is used for encryption and decryption using ASCII key matrix of order 16X16. Applying playfair rules for encryption and decryption. Playfair is a digraph substitution cipher. Playfair makes use of pairs of letters for encryption and decryption. This
application makes use of all ASCII characters which makes brute force attack impossible.
Gain Flatness and Noise Figure Optimization of C-Band EDFA in 16-channels WDM...Yayah Zakaria
In this paper, Gain Flatness and Noise Figure of Erbium Doped Fiber Amplifier (EDFA) have been investigated in 16-channels Wavelength Division Multiplexing (WDM). Fiber Bragg Grating (FBG) is used in C-band with the aim to achieve flat EDFA output gain. The proposed model has been studied in detail to evaluate and to enhance the performance of the transmission system in terms of gain, noise figure and eye diagram of the
received signals. To that end, various design parameters have been investigated and optimized, such as frequency spacing, EDF length and temperature. To enhance the transmission system performance in terms of gain flatness, the Gain Flattening Filter (GFF) has been introduced in the design. To prove the efficiency of the new design, the optical transmission
system with optimized design parameters has been compared with a previous works in the literature. The simulation results show satisfactory performance with quasi-equalized gain for each channel of the WDM transmission system.
In this paper partial H-plane band-pass waveguide filter, utilizing a novel resonant structure comprising a metal window along with metal posts has been proposed to compactthe filter size. The metal windows and postshave been implemented transversely in a partial H-plane waveguides, which have
one-quarter cross section size compared to the conventional waveguides in the same frequency range. Partial H-plane band-pass waveguide filter with novel proposed resonant structures has considerably shorter longitudinal length compared to the conventional partial H-plane filters, so that they reduce both cross section size and the total length of the filter compared to
conventional H-plane filters, in the same frequency range. In the presented design procedure, the size and shape of each metal window and metal posts has been determined by fitting the transfer function of the proposed resonant structure to that of a desiredone, which is obtained from a suitable equivalent
circuit model. The design process is based on optimization using
electromagnetic simulator software, HFSS. A proposed partial H-plane bandpass
filter has been designed and simulated to verify usefulness and
performanceof the design method.
Vehicular Ad Hoc Networks: Growth and Survey for Three Layers Yayah Zakaria
A vehicular ad hoc network (VANET) is a mobile ad hoc network that allows wireless communication between vehicles, as well as between vehicles and roadside equipment. Communication between vehicles promotes safety and reliability, and can be a source of entertainment. We investigated the historical development, characteristics, and application fields of VANET and briefly introduced them in this study. Advantages and disadvantages were discussed based on our analysis and comparison of various classes of MAC and routing protocols applied to VANET. Ideas and breakthrough directions for inter-vehicle communication designs were proposed based on the
characteristics of VANET. This article also illustrates physical, MAC, and network layer in details which represent the three layers of VANET. The main works of the active research institute on VANET were introduced to help researchers track related advanced research achievements on the subject.
Barriers and Challenges to Telecardiology Adoption in Malaysia Context Yayah Zakaria
Mainly in infrastructure deficient communities, telecardiology is considered as a complement to insufficient cardiac care. Telecardiology can reduce travelling and waiting time, enables information sharing in shorter time and facilitate care in rural and remote areas. A qualitative study examined the perspectives of health care providers: cardiologist and general physician and
health care service receivers: patient and public towards telecardiology adoption. The barriers in telecardiology adoption were identified in this paper. It includes practicality of telecardiology, the need of education for staffs and administrators, ease of use, preferred face-to-face consultation,
cost and confidentiality. Improvements can be done by the implementers based on this study in order to promote telecardiology successfully in Malaysia.
Novel High-Gain Narrowband Waveguide-Fed Filtenna using Genetic Algorithm Yayah Zakaria
Filtenna is an antenna with filtering feature. There are many ways to design a filtenna. In this paper, a high-gain narrowband waveguide-fed aperture filtenna has been proposed and designed. A patterned plane, which is designed using genetic algorithm has been used at the open end of the waveguide fed, mounted on a conducting ground plane. To design the patterned pattern, magnetic field integral equation of the structure has been derived, so it has been solved using method of moments. The proposed filtenna has been simulated with HFSS that confirms the results obtained by method of moments. Finally, an unprinted dielectric as a superstrate has been used to enhance the gain of the filtenna. The filtenna bandwidth is 1.76% (160 MHz) which has the gain of 15.91 dB at the central frequency
of 9.45 GHz.
Uncertain Systems Order Reduction by Aggregation Method Yayah Zakaria
In the field of control engineering, approximating the higher-order system with its reduced model copes with more intricateproblems. These complex problems are addressed due to the usage of computing technologies and advanced algorithms. Reduction techniques enable the system from higherorder to lower-order form retaining the properties of former even after reduction. This document renders a method for demotion of uncertain systems based on State Space Analysis. Numerical examples are illustrated to show the accuracy of the proposed method.
A Three-Point Directional Search Block Matching Algorithm Yayah Zakaria
This paper proposes compact directional asymmetric search patterns, which we have named as three-point directional search (TDS). In most fast search motion estimation algorithms, a symmetric search pattern is usually set at the minimum block distortion point at each step of the search. The design of the
symmetrical pattern in these algorithms relies primarily on the assumption that the direction of convergence is equally alike in each direction with respect to the search center. Therefore, the monotonic property of real -world video sequences is not properly used by these algorithms. The strategy of TDS is to keep searching for the minimum block distortion point in the most probable directions, unlike the previous fast search motion estimation algorithms where all the directions are checked. Therefore, the proposed method significantly reduces the number of search points for locating a motion vector. Compared to conventional fast algorithms, the proposed method has the fastest search speed and most satisfactory PSNR values for
all test sequences.
Human Data Acquisition through Biometrics using LabVIEW Yayah Zakaria
Human Data Acquisition is an innovative work done based on fingerprints of a particular person. Using the fingerprints we can get each and every detail of any individual. Through this, the data acquired can be used in many applications such as Airport Security System, Voting System, and Employee login System, in finding the thieves etc. We in our project have implemented
in Voting System. In this we use the components such as MyDAQ which is data acquisition device. The coding here is in done in a Graphical Programming language named LabVIEW where the execution of any program is done in a sequential way or step by step according to the data received.
A Survey on Block Matching Algorithms for Video Coding Yayah Zakaria
Block matching algorithm (BMA) for motion estimation (ME) is the heart to many motion-compensated video-coding techniques/standards, such as ISO MPEG-1/2/4 and ITU-T H.261/262/263/264/265, to reduce the temporal redundancy between different frames. During the last three decades,
hundreds of fast block matching algorithms have been proposed. The shape and size of search patterns in motion estimation will influence more on the searching speed and quality of performance. This article provides an overview of the famous block matching algorithms and compares their computational complexity and motion prediction quality.
An Accurate Scheme for Distance Measurement using an Ordinary Webcam Yayah Zakaria
Nowadays, image processing has become one of the widely used computer aided science. Two major branches of this scientific field are image enhancement and machine vision. Machine vision has many applications and demands in robotic and defense industries. Detecting distance of objects is
one of the extensive research in the defense industry and robotic industries that a lot of annual projects have been involved in this issue both inside and outside the country. So, in this paper, an accurate algorithm is presented for measuring the distance of the objects from a camera. In this method, a laser
transmitter is used alongside a regular webcam. The laser light is transmitted to the desired object and then the distance of the object is calculated using image processing methods and mathematical and geometric relations. The performance of the proposed algorithm was evaluated using MATLAB software. The accuracy rate of distance detection is up to 99.62%. The results
also has shown that the presented algorithms make the obstacle distance measurement more reliable. Finally, the performance of the proposed algorithm was compared with other methods from different literatures.
Identity Analysis of Egg Based on Digital and Thermal Imaging: Image Processi...Yayah Zakaria
This research was conducted to analyze the identification of eggs. The research processes use two tools, namely thermal imaging camera and smartphone camera. The identification process was done by using Matlab prototype tools. The image has been acquired by means of proficiency level, then analyzed and applied several methods. Image acquisition results of thermal imaging camera are processed using morphological dilation and do the complement in black and white (BW). While the digital image uses the merger method of morphological dilation and opening, and it doesn't need to be complemented. Labeling process is done, and the process of determining centroid and bounding box. The process has been done and it can be applied for identifying of chicken eggs with the accuracy rate of 100%. There are different methods of both images is obtained area (pixels) which is equivalent to the difference is very small as 6 x 10-3.
.
Recognition of Tomato Late Blight by using DWT and Component Analysis Yayah Zakaria
Plant disease recognition concept is one of the successful and important applications of image processing and able to provide accurate and useful information to timely prediction and control of plant diseases. In the study, the wavelet based features computed from RGB images of late blight infected images and healthy images. The extracted features submitted to Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA) and Independent Component Analysis performed (ICA) for reducing dimensions in feature data processing and classification. To recognize and classify late blight from healthy plant images are classified into two classes i.e. late blight infected or healthy. The Euclidean Distance measure is used to compute the distance by these two classes of training and testing dataset for tomato late blight recognition and classification. Finally, the three-component analysis is compared for late blight recognition accuracy. The Kernel Principal Component Analysis (KPCA) yielded overall recognition accuracy with 96.4%.
Fuel Cell Impedance Model Parameters Optimization using a Genetic Algorithm Yayah Zakaria
The objective of this paper is the PEM fuel cell impedance model parameters identification. This work is a part of a larger work which is the diagnosis of the fuel cell which deals with the optimization and the parameters identification of the impedance complex model of the Nexa Ballard 1200 PEM fuel cell. The method used for the identification is a sample genetic algorithm and the proposed impedance model is based on electric parameters, which will be found from a sweeping of well determined frequency bands. In fact, the frequency spectrum is divided into bands according to the behavior of the fuel cell. So, this work is considered a first in the field of impedance spectroscopy So, this work is considered a first in the field of impedance spectroscopy. Indeed, the identification using genetic algorithm requires experimental measures of the fuel cell impedance to optimize and identify the impedance model parameters values. This method is characterized by a good precision compared to the numeric methods. The obtained results prove the effectiveness of this approach.
Noise Characterization in InAlAs/InGaAs/InP pHEMTs for Low Noise Applications Yayah Zakaria
In this paper, a noise revision of an InAlAs/InGaAs/InP psoeudomorphic high electron mobility transistor (pHEMT) in presented. The noise performances of the device were predicted over a range of frequencies from 1GHz to 100GHz. The minimum noise figure (NFmin), the noise resistance (Rn) and optimum source impedance (Zopt) were extracted using two
approaches. A physical model that includes diffusion noise and G-R noise models and an analytical model based on an improved PRC noise model that considers the feedback capacitance Cgd. The two approaches presented matched results allowing a good prediction of the noise behaviour. The
pHEMT was used to design a single stage S-band low noise amplifier (LNA). The LNA demonstrated a gain of 12.6dB with a return loss coefficient of 2.6dB at the input and greater than -7dB in the output and an overall noise figure less than 1dB.
Effect of Mobility on (I-V) Characteristics of Gaas MESFET Yayah Zakaria
We present in this paper an analytical model of the current–voltage (I-V) characteristics for submicron GaAs MESFET transistors. This model takes into account the analysis of the charge distribution in the active region and incorporate a field depended electron mobility, velocity saturation and charge
build-up in the channel. We propose in this frame work an algorithm of simulation based on mathematical expressions obtained previously. We propose a new mobility model describing the electric field-dependent. predictions of the simulator are compared with the experimental data [1] and
have been shown to be good.
Performance Analysis of Post Compensated Long Haul High Speed Coherent Optica...Yayah Zakaria
This paper addresses the performance analysis of OFDM transmission system based on coherent detection over high speed long haul optical links with high spectral efficiency modulation formats such as Quadrature Amplitude Modulation (QAM) as a mapping method prior to the OFDM multicarrier representation. Post compensation is used to compensate for
phase noise effects. Coherent detection for signal transmitted at bit rate of 40 Gbps is successfully achieved up to distance of 3200km. Performance is analyzed in terms of Symbol Error Rate and Error Vector Magnitude by varying Optical Signal to Noise Ratio (OSNR) and varying the length of the fiber i.e transmission distance. Transmission performance is also observed through constellation diagrams at different transmission distances and
different OSNRs.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
2. IJECE ISSN: 2088-8708
Improved Algorithm for Pathological and Normal Voices Identification (Brahim Sabir)
239
the parametric and non-parametric for features selection [4], [5]. The classification of voice pathology can be
seen as pattern recognition so statistical methods are an important approach.
This paper is organized as a follow: in second Section is dedicated to relevant acoustic parameters
that differentiate pathological from normal voices, the classification methods are presented in Section 3. The
degree of severity in Section 4, Section 5 deals with the defined threshold in literature to differentiate
pathological and normal voices. The proposed method is presented in Section 5, the results in Section 6, and
the last Section is reserved for the conclusion and future work.
1.1. Relevant acoustic parameters
Analysis of voice signal is performed by the extraction of acoustic parameters using digital signal
processing techniques [6]. However the amount of these parameters is huge to be analyzed, which lead to
define the relevant ones. Many papers identify HNR (harmonic-to-noise ratio), Pitch, shimmer as
relevant [7], [8] to identify pathological voices among normal ones.
Other papers [9], [10] found that jitter, normalized autocorrelation of the residual signal in pitch
lag, shimmer and noise measures like HNR (Harmonic to Noise Ratio), are used to identify pathological
voices. Amplitude perturbation, voice break analysis, subharmonic analysis, and noise related analysis are the
most relevant ones in [9].
And in [11], pitch, jitter, shimmer, Amplitude Perturbation Quotient (APQ), Pitch Perturbation
Quotient (PPQ), Harmonics to Noise Ratio (HNR), pitch perturbation quotient (PPQ), Normalized Noise
Energy (NNE), Voice Turbulence Index (VTI), Soft Phonation Index (SPI), Frequency Amplitude Tremor
(FATR), Glottal to Noise Excitation (GNE), have been seen as relevant ones. There are also many works that
tested the combination of mentioned features [12].
1.2. Classification methods
Various pattern classification methods have been used such as : Gaussian Mixture Models(GMM)
and artificial neural network (ANN) [13], [14], Bidirectional Neural Network (BNN) [15], multilayer
perceptron (MLP) which achieved a classification rate of 96% [16], support vector machine (SVM) [17-19],
Genetic Algorithm (GA) [20], [21], method based on hidden markov model [22], [23]. The use of modulation
spectra, classification based on multilayer network [8].
The correct classification rate obtained in previous researches to distinguish between pathological
and healthy voices varies significantly: 89.1% [24], 91.8% [25], 99.44% [26], 90.1%, 85.3% and 88.2% [27].
However, the comparison among the researches carried out is very complex due to the wide range of
measures, data sets and classifiers employed. Authors reported detection accuracy from 80% to 99%. The
results depend on efficiency of methods, choice of classifiers and characteristics of databases. The best
classification was obtained using nine acoustic measures and achieving an accuracy of 96.5 % [28].
1.3. Degree of severity
Actually, the hard part of pathological voice detection is to discriminate light or moderate
pathological voices from normal subjects. The degree according to the G parameter of the GRBAS scale
proposed by [29]. On this G-based scale, a normal voice is rated as 0, a slight voice disorder as 1, a moderate
voice disorder as 2 and finally, a severe voice disorder as 3.
2. THRESHOLD IN LITERATURE
Based on the literature, we have considered: pitch, jitter, shimmer and harmonic-to-noise ratio
(HNR) as relevant parameters to identify the pathological voices. In order to classify initially the recorded
utterances, we are based on threshold defined in the literature as listed in Table 1 and Table 2.
Table 1. Recommended Values of Pitches for Male and Female Signals ([7])
Mean Pitch Minimum Pitch Maximum Pitch
Female signal
recommended value
225 Hz for adult females, 155 for adult females, 334Hz for adult ,
Male signal
recommended value
adult males 128 Hz , adult males 85 Hz , adult males 196 Hz,
3. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 238 – 243
240
Table 2. Recommended Values to differentiate Pathological and Normal Voices
Praat [8] Teixeira [8]
Jitter ddp% Female signal
recommended value
<=1.04% <=0.66
Male signal recommended
value
<=1.04% <=0.44
Shimmer dda% Female signal
recommended value
<=3.810% <=2.43
Male signal recommended
value
<=3.810% <=2.01%
HNR (dB) Female signal
recommended value
<20 dB 15.3dB
Male signal recommended
value
<20 dB 17.3 dB
3. PROPOSED METHOD
The corpus used in this paper is composed of 50 voices of male and 50 voices of females aged 19 to
22 (mean: 20.2). The speech material is obtained by sustained vowel /a/ varies from 3 to 5 seconds (mean 3s).
Among the 50 voices, 25 are normal and 25 are pathological based on initial classification. The signals are
recorded keeping mic 5 cm away from the mouth using a Dictaphone (Sony ICDPX240 4GB).
The record consisted in a 3-4 seconds of sustained sound of the vowel /a/ for each student. The
sampling frequency used for recording these signals was 22.05 kHz, with 16 bit resolution and mono. Praat
software is used to extract the acoustic parameters after transferring the recorded utterance from Dictaphone
to a personal computer Dell (intel ® core ™ i7 CPU M640 @ 2.8 Ghz 2.8 Ghz, 4 Go memory) using
audacity software.
Connect the headphone jack to the line input of the PC with a 3.5 jack cable male / male, then
activates audio recording with audio processing software audacity. An initial classification based on
threshold from the literature, in order to classify the recorded utterances on healthy and pathological. The
technic used to identify pathological voices is ANN (artificial neural network): 4 coefficients are extracted
(pitch, HNR, jitter and shimmer) from the signal, these coefficients are the vector input of the net.
The net is formed by 3 layers and the used algorithm is back propagation algorithm. The activation
function is sigmoid. The proposed algorithm was tested by German database (The sample are from 19 to 22
years old) utterances of /a/. The degree of the severity is evaluated by: Degree of Severity in %= (Measured
value –normal value)/( normal value). Figure 1 shows the normal value is related to the threshold defined in
the literature. Macro steps of the proposed algorithm and Figure 2 shows training of ANN with input vector
(4 acoustic parameters).
Figure 1. Macro Steps of the Proposed Algorithm
Input audio
Extract acoustic parameters:
Pitch, Jitter, Shimmer and HNR
-Initial classification based on threshold from the literature of acoustic
parameters
-Evaluate the degree of severity
-Feed the neural network
-Train the neural network
Test the neural network
4. IJECE ISSN: 2088-8708
Improved Algorithm for Pathological and Normal Voices Identification (Brahim Sabir)
241
Figure 2. Training of ANN with Input Vector (4 Acoustic Parameters)
4. RESULTS
Description of the dataset as shown in Table 3:
Table 3. Description of the Dataset
Pronounced vowel /a/ Normal pathological
Training number 50 50
Test number 20 20
Correct classification 18 19
Rate of classification 90% 95%
In this work, the testing set we have choose a German database for voice disorder developed by
Putzer in [30] which contain healthy and pathological voice, where each one pronounce vowels [i, a, u] /1-2 s
in wav format at different pitch (low, normal, high). The results obtained in that work indicate a percentage
of correct classification of 90% for normal voices and 95% for pathological voices.
These results prove that the pitch, HNR, jitter and shimmer can be best input parameters for
discrimination and identification of pathological voice using neural network. Also the degree of severity of
the identified pathology was estimated in order to give the clinician clear idea on severity of the
communication disorder. In this process the accuracy, sensitivity and specificity for each threshold were
observed to get the threshold which achieves the best accuracy, and in the same time preserves a high
sensitivity and specificity.
True positive (TP): refer to pathological voices that were classified as pathological by proposed algorithm.
True negative (TN): refer to normal voices that were classified as normal by proposed algorithm.
False positive (FP): refer to normal voices that were incorrectly classified as pathological by proposed
algorithm.
False negative (FN): refer to pathological voices that were incorrectly classified as normal by proposed
algorithm.
1) Specificity = TN/(TN + FP)= 95.1%
2) Sensitivity = TP/(TP + FN)=1.6%
3) Accuracy = (TN + TP)/(TN + FP + TP +FN )=97.9%
Initialization of synaptic weights through each neuron in the hidden layer
Initializing synapse weight bias of each neural layer exit.
Initializing synapse weight of each neuron of the output layer:
Activation function
Function calculates the output ys
Function calculates the error between the observed and desired output
Function calculates the local output layer Gradient
Function lets you choose the maximum error
Function to adjust the weights of the neurons in the output layer
Function calculates the gradient error in the hidden layer
Function to adjust the weights of the hidden layer neurons
Function calculates the global error
Record the weight in the file "poids.txt".
Record the weight of the hidden layer
Add the weight of the output layer
5. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 238 – 243
242
Table 4 shows evaluation of the proposed algorithm.
Table 4. Evaluation of the Proposed Algorithm
5. CONCLUSION
The purpose of this work is to conceive a model to assist the clinicians and the professors to follow
the evolution of the voice disorders among students, based only on acoustic properties of a student’s voice.
The results using the ANN (Artificial neural network) classifier gives 90%for normal class and 95% for
pathological class.
The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%),
sensitivity(1.6%), and specificity(95.1%). This work can be applied in the field of preventive medicine in
order to achieve early detection of voice pathologies. In addition to that, an estimation of the degree of
severity was proposed. The major advantage of this type of automatic identification tool is a determinism that
is currently lacking in the subjective analysis.
As a possible improvement, system performance can be improved by increasing the corpus
Learning. And as a Future work : identify the type of pathology for each voice disorder( stuttering,
dysphonia…), the study of the continuous speech is a superior objective and an evident next step and
implement the proposed algorithm on a hardware ready to use device as a DSP(Digital signal processing
board) or FPGA (Field Programmable Gate Arrays Board).
ACKNOWLEDGEMENT
The authors fully acknowledged the students Ayoub Ratnane and Abdellah Elhaoui for their
valuable contributions.
REFERENCES
[1] P. Henriquez, et al., “Characterization of healthy and pathological voice through measures based on nonlinear
dynamics,” IEEE Trans on Audio, Speech and Language Processing, vol/issue: 17(6), 2009.
[2] J. Camburn, et al., “Parkinson's disease: Speaking out,” Denver, CO, The National Parkinson Foundation, 1998.
[3] M. Vasilakis and Y. Stylianou, “Voice Pathology Detection Basedeon Short-Term Jitter Estimations in Running
Speech,” Folia Phoniatr Logop, vol. 61, pp. 153–170, 2009.
[4] N. S. Lechon, et al., “Effect of Audio Compression in Automatic Detection of Voice Pathologies,” in IEEE
Transaction on Biomedical Engineering, vol/issue: 55(12), 2008.
[5] N. S. Lechon, et al, “Methodological issues in the development of automatic systems for voice pathology
detection,” in Biomed. Signal Processing Control, vol/issue: 1(2), pp. 120-128, 2006.
[6] L. Salhi, et al., “Voice Disorders Identification Using Multilayer Neural Network,” The International Arab Journal
of Information Technology, vol/issue: 7(2), pp. 177-185, 2010.
[7] Teixeira J. P., et al., “Vocal Acoustic Analysis M Jitter, Shimmer and HNR Parameters,” Procedia Technology,
Elsevier, vol. 9, pp. 1112-1122, 2013.
[8] P. Boersma and Weenink D., “Praat: doing phonetics by computer (Version 5.1.17),” 2009. [Computer program]
Retrieved October 5, 2009, from http://www.praat.org/.
[9] V. Parsa and D. G Jamieson, “Identification of pathological voices using glottal noise measures,” Journal of
Speech, Language & Hearing Research, vol/issue: 43(2), pp. 469–485, 2000.
[10] V. Srinivasan, et al., “Classification of Normal and Pathological Voice using GA and SVM,” International Journal
of Computer Applications, vol/issue: 60(3), 2012.
[11] J. I. G. Llorente, et al., “Support vector machines applied to the detection of Voice disorders,” Nonlinear Analyses
and Algorithms for Speech Processing, pp 219-230, 2005.
[12] J. D. A. Londono, et al., “Automatic detection of pathological voices using complexity measures, noise parameters,
and mel-cepstral coefficients,” IEEE Trans. Biomed. Eng, vol/issue: 58(2), 2011.
[13] B. Boyanov and S. Hadjitodorov, “Acoustic analysis of pathological voices,” IEEE Engineering in Medicine and
biology, pp. 74-82, 1997.
[14] John H. and L. Hansen, “A non linear operator-based speech feature analysis method with application to vocal fold
pathology assessment,” IEEE Transaction on Biomedical Engineering, vol/issue: 45(3), pp. 300-312, 1998.
Classification
Pathological Normal
Pathological TP:58.3 FN:0.8
Normal FP:0.7 TN:13.8
6. IJECE ISSN: 2088-8708
Improved Algorithm for Pathological and Normal Voices Identification (Brahim Sabir)
243
[15] I. Esmaili, et al., “Two-stage feature compensation of clean and telephone speech signals employing bidirectional
neural network,” 10th International Conference on Information Science, Signal Processing and their Applications,
Malaysia, 2010.
[16] N. S. Lechon, et al., “Methodological issues in the development of automatic systems for voice pathology
detection,” Biomedical Signal Processing and Control, vol. 1, pp. 120–128, 2006.
[17] M. Arjmandi and M. Pooyan, “An optimum algorithm in pathological voice quality assessment using wavelet-
pachet-based features, linear discriminant analysis and support vector machine,” Biomedical signal processing and
control, vol. 7, pp. 3-19, 2012.
[18] T. Bocklet, et al., “Evaluation and assessment of speech intelligibility on pathologic voices based upon acoustic
speaker models,” in proceeding of 3rd advanced voice function assessment (AVFA’09) international workshop, pp.
89-92, 2009.
[19] C. M. Vikram and K. Umarani, “Pathological Voice Analysis To Detect Neurological Disorders Using MFCC &
SVM,” International Journal of Advanced Electrical and Electronics Engineering, vol/issue: 2(4), 2013.
[20] D. Pravena, et al., “Pathological Voice Recognition for Vocal Fold Disease,” International Journal of Computer
Applications, vol/issue: 47(13), 2012.
[21] R. Behroozmand and F. Almasganj, “Optimal selection of wavelet-packet-based features using genetic algorithm in
pathological assessment of patients' speech signal with unilateral vocal fold paralysis,” Comput. Biol. Med., vol. 37,
pp. 474–485, 2007.
[22] V. Majidnezhad and I. Kheidorov, “A HMM-Based Met hod for Vocal Fold Pathology Diagnosis,” International
Journal of Computer Science, vol/issue: 9(6), 2012.
[23] A. A. Dibazar, et al., “Pathological Voice Assessment,” IEEE EMBS 2006 NEW YORK, 2006.
[24] R. B. R. R. N. Wormald, et al., “Performance of an automated, remote system to detect vocal fold paralysis,” in
Ann Otol Rhinol Laryngol, vol. 117, pp. 834–838, 2008.
[25] M. Little, et al., “Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,”
BioMedical Engineering OnLine, vol/issue: 6(1), pp. 23, 2007.
[26] A. Dibazar, et al., “Feature analysis for automatic detection of pathological speech,” Proceedings of the Second
Joint EMBS/BMES Conference, vol. 1, pp. 182–183, 2002.
[27] E. Fonseca and J. Pereira, “Normal versus pathological voice signals,” Engineering in Medicine and Biology
Magazine, IEEE, vol/issue: 28(5), pp. 44–48, 2009.
[28] V. Parsa and D. G. Jamieson, “Acoustic discrimination of pathological voice: Sustained vowels versus continuous
speech,” Journal of Speech, Language, and Hearing Research, vol. 44, pp. 327–339, 2001.
[29] M. Hirano, “Psycho-acoustic evaluation of voice: GRBAS Scale for evaluating the hoarse voice,” Clinical
Examination of voice, Springer Verlag, 1981.
[30] http://stimmdb.coli.uni-saarland.de/