SlideShare a Scribd company logo
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
772
Performance estimation based recurrent-convolutional encoder
decoder for speech enhancement
A.Karthik1
, Dr. J.L Mazher Iqbal 2
1
Research scholar, Veltech Rangarajan Dr sagunthala R&D Institute of Science
and Technology, Chennai
2
Research Professor, Veltech Rangarajan Dr sagunthala R&D Institute of Science
and Technology, Chennai
Abstract
Speech is the key to our communication skills. As we use recorded speech to
communicate remotely with other human beings, we become more and more accustomed
to machines that simply "listen to us". The goal of the improvement is to improve the
intelligibility and / or the general perceptual quality of the degraded vocal signal through
audio signal processing techniques. Speech enhancement with noise reduction or noise
reduction is the most important field of speech improvement and is used for many
applications such as cell phones, VoIP, teleconferencing systems, voice recognition and
hearing aids. Speech enhancement is necessary for many applications where the clean
voice signal is important for further processing.
Keywords: Speech Recognition, Automatic Speech Recognition (ASR), Recurrent
Convolutional Encoder-Decoder (R-CED) network, PESQ, STOI and CER.
1. Introduction
Speech enhancement techniques focus primarily on removing noise from a voice signal.
The various types of noise and techniques to eliminate these noises. In recent years,
learning architectures based on deep neural networks (DNN) they have been very
successful in related areas such as speech recognition. The success of deep neural
networks (DNN) in automatic speech recognition has led to the study of deep neural
networks for ASR noise suppression and speech improvement. The central theme of using
DNN to improve speech is that speech noise corruption is a complex process and a
complex nonlinear model such as DNN is suitable for modelling it. Although there is very
little in-depth work on the usefulness of DNNs for improving speech, it has shown
promising results and could outperform classic SE methods .A common aspect in many of
these works is an assessment of the conditions of coupled or seen noise. The
corresponding or displayed conditions imply that the types of test noise (e.g. ground
noise) are the same as for training. Unlike classical methods, motivated by aspects of
signal processing, DNN-based methods are data-driven approaches and the corresponding
noise conditions may not be ideal for evaluating DNNs for improving speech. . In recent
years, learning architectures based on the deep neural network (DNN) have been very
successful in related areas such as speech recognition. The success of the deep neural
network (DNN) in automatic speech recognition has led to the search for deep neural
networks for noise suppression for ASR and speech improvement. The central theme of
using DNN to improve speech is that speech noise corruption is a complex process and a
complex nonlinear model such as DNN is suitable for modelling it. Although there is very
little in-depth work on the usefulness of DNNs for improving speech, it has shown
promising results and could outperform classic SE methods. A common aspect in many of
these works is an assessment of the conditions of coupled or seen noise. The
corresponding or displayed conditions imply that the types of test noise (e.g. ground
noise) are the same as for training. Unlike classical methods, motivated by aspects of
signal processing, DNN-based methods are data-driven approaches and the corresponding
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
773
noise conditions may not be ideal for evaluating DNNs for speech improvement.Speech
enhancement (SE) is a serious research problem in audio signal processing. The goal is to
improve the quality and intelligibility of voice signals corrupted by noise. Thanks to its
application in various sectors, such as automatic voice recognition, mobile
communication, hearing aids, etc.
2. Advantages on speech enhancement:
Free up cognitive working space
Allows the user to operate a computer by speaking to it
Eliminates handwriting, spelling problems
Always spells correctly (doesn't always recognize words correctly)
Allows dictation of text, commands
3. Disadvantages on speech enhancement
Assists with one stage of the writing process, not a solution to the writing
problem
Difficult to use in classroom settings, due to noise interference
Requires large amounts of memory to store voice files
Makes errors, can be frustrating without adequate support
Requires each user to train the software to recognize a voice, hard for poor
decoders
4. Application of speech enhancement
Speaker identification
Automatic speech recognition
Biomedical speech recognition
Cell phone speech recognition
5. Related work:
(Wang and Brookes 2018) presented an algorithm to improve the speech of the
modulation domain using the Kalman filter The proposed estimator jointly models the
estimated dynamics of the noise and speech spectral amplitudes to obtain an estimate of
the mean squared error estimator (MMSE) of the speech amplitude spectrum assuming
that noise and language are additive in the compound domain . Understand the dynamics
of noise amplitudes with those of speech amplitudes. Therefore, this work proposed the
statistical model "Gaussring" which contains a mixture of Gaussians whose centers are in
a circle on the complex plane. The performance of the proposed algorithm has been
estimated using the STOI measurement (short-term objective intelligibility), the PESQ
measure (perceptual assessment of speech quality) and the seg SNR measure (segmental
SNR). For measures of speech quality, the proposed algorithm was displayed to provide
constant improvement over a wide range of SNR while associated with competitive
algorithms. Speech recognition experiments also showed that the Gaussring-based
algorithm reaches two types of noise well
(Bando, et al. 2018) implemented a semi-supervised speech enhancement techniques
known as variation auto encoder–nonnegative matrix factorization (VAE-NMF), which
involved A probabilistic model of generative speech based on a VAE and this noise was
based on a non-negative matrix factorization. Here, only the vocal model has been pre-
trained to use a sufficient amount of clean voice. Using the vocal model as a pre-
distribution, it is possible to obtain subsequent estimates of the clean voice by using a
Monte Carlo Markov chain (MCMC) sample, familiarizing the noise model with noisy
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
774
environments. Experiments confirmed that VAE-NMF outperformed conventional
supervised techniques based on deep neural networks in invisible and noisy environments.
A next stimulating direction was to extend VAE-NMF to the multichannel scenario.
Meanwhile, a VAE and a well-studied linear phase model can mean complicated vocal
signals and a spatial mixing process, respectively, would be efficient to integrate these
models into a unified probabilistic structure. Also, consider GAN-based training of the
voice model to accurately learn a probability distribution of the voice.
(Donahue, et al. 2018) introduced the frequency-domain Speech Enhancement
Adverse Generative Networks (FSEGAN), a technique based on Adverse Generator
Networks (GAN) to perform speech improvement in the frequency domain, and revealed
improvements in the performance of Automatic Speech Recognition (ASR) in relation to
the previous time domain method. Then, it provided the evidence that was retrained;
FSEGAN could progress the performance of previous Multi-Style-Training (MTR)-
trained the ASR systems. Experiments have been indicated that for ASR as simpler
regression techniques may be preferable to GAN based improvement. It seems that
FSEGAN collects plausible spectra and could be more valuable for telephone applications
when combined with a representation of invertible characteristics.
(Pascual, et al. 2018) He proposed the performance of adapting speech improvement
to this generative confrontation network, adjusting the generator with the least amount of
data. In order to examine the minimum requirements, stable behaviors was obtained
in terms of various objective metrics and two different types of languages: Korean and
Catalan. The main objective of the study of the variability of the test performance in
relation to invisible noise as a function of the number of different types of noise was
available for the training set. Performance was revealed as the adaptation of the pre-
trained English model with ten minutes of data. It has already achieved comparable
performance by having two orders of magnitude more. In addition, they demonstrated
relative stability in the test performed in relation to the number of types of training noise.
(Zhao, et al. 2018) they elucidated the EHNET that combined recurrent neural
networks and convolutional neural networks to improve speech. EHNET's inductive bias
was adequate to address speech improvement. The convolution cores are able to
effectively detect local patterns in bidirectional connections and spectrograms; Recurring
connections can automatically model dynamic correlations between adjacent frames. Due
to the low nature of convolutions, EHNET required fewer calculations than the recurrent
neural network and machine learning programming. The performance of the results
demonstrated that EHNET consistently outperforms competitors in general in the five
different metrics. In addition, it was able to simplify the invisible noise that confirmed the
EHNET's effectiveness in improving speech.
6. Challenges to be overcome:
In the existing work, The classical techniques guided by the a priori and a posteriori
SNR decision become latent variables in the NRN, from which the estimated probability
dependent on the frequency of the presence of the speech is used to recursively update the
latent variables. , but the difference in recurrent neural networks (RNN) is very unstable if
ReLu is used as an activation function. Therefore, it is unable to process very long chains
due to the trigger function, RNNs cannot stack in very deep models and RNNs cannot
track long-term dependencies.
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
775
7. Proposed meet out
To improve the accuracy of speech improvement in the RCNN approach of an a
priori and a posteriori SNR.
To recover the quality of enhanced speech in the speech-present regions, and
extend the additive noise framework.
To show the efficiency of speech enhancement with the increasing dimension and
decreasing dimension is used by the Recurrent Convolutional Encoder-Decoder
(R-CED).
8. Proposed method
To overcome the above challenges, the speech enhancement is used to find the noise
free speech mainly it estimated the priori and posterior SNR. The priori SNR can be
understood as the true instantaneous power ratio between each spectral component of
clean speech and noise, while the posteriori SNR can be viewed as the instantaneous
power ratio between each spectral component of observed noisy speech and noise. In this
proposed work, a Recurrent Convolutional Encoder-Decoder (R-CED) network is used.
R-CED consists of repetitions of a convolution, batch normalization, and a ReLU
activation layer. R-CED encodes the features into higher dimension along the encoder and
achieves compression along the decoder. The number of filters is kept symmetric: at the
encoder, the number of filters is gradually increased, and at the decoder, the number of
filters is gradually decreased. Here initialize the trellis map, design the circuit logic,
perform LP norm decoding .Finally decoding. Finally maximum likelihood estimates by
traversing the Trellis Map Where prediction of distortion elements. Moreover, the process
of decoding it will get the noise free speech then the loss function occurred from the priori
SNR. At the loss function, MSE will calculate and compared with the threshold value, if
the value is greater than the MSE goes to the R-CED process. If the values are lesser than
the MSE, then the speech will enhanced. From this enhanced speech, the performance
analyzed as the metrics of SNR (Signal Noise Ratio), SDR (Signal to Distortion Ratio),
MSE (Mean Squared Error).
Algorithm / Techniques to be used
SNR based Recurrent-Convolutional Encoder Decoder (SNR- RCED)
Performance metrics:
PESQ(Perceptual Evaluation of Speech Quality)
STOI(Short-time objective intelligibility)
CER(Character Error Rate)
MSE (Mean Squared Error)
SNR (Signal Noise Ratio)
SDR (Signal to Distortion Ratio)
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
776
9. Flow of proposed work
Figure 1: Flow of the proposed work
References
[1] H. Zhao, et al., "Convolutional recurrent neural networks for speech enhancement,"
in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2018, pp. 2401-2405.
[2] H.-P. Liu, et al., "Bone-conducted speech enhancement using deep denoising
autoencoder," Speech Communication, vol. 104, pp. 106-112, 2018.
[3] Y. Zhao, et al., "Perceptually guided speech enhancement using deep neural
networks," in 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2018, pp. 5074-5078.
[4] Q. He, et al., "Multiplicative update of auto-regressive gains for codebook-based
speech enhancement," IEEE/ACM Transactions on Audio, Speech and Language
Processing (TASLP), vol. 25, pp. 457-468, 2017.
[5] R. Henni, et al., "A new efficient two-channel fast transversal adaptive filtering
algorithm for blind speech enhancement and acoustic noise reduction," Computers &
Electrical Engineering, vol. 73, pp. 349-368, 2019.
[6] Y. Xia and R. Stern, "A Priori SNR Estimation Based on a Recurrent Neural
Network for Robust Speech Enhancement," in Interspeech, 2018, pp. 3274-3278.
[7] X. Du, et al., "End-to-End Model for Speech Enhancement by Consistent
Spectrogram Masking," arXiv preprint arXiv:1901.00295, 2019.
[8] R. Bendoumia, "Two-channel forward NLMS algorithm combined with simple
variable step-sizes for speech quality enhancement," Analog Integrated Circuits and
Signal Processing, vol. 98, pp. 27-40, 2019.
International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
777
[9] Y. Wang and M. Brookes, "Model-based speech enhancement in the modulation
domain," IEEE/ACM Transactions on Audio, Speech and Language Processing
(TASLP), vol. 26, pp. 580-594, 2018.
[10] Y. Bando, et al., "Statistical speech enhancement based on probabilistic integration
of variational autoencoder and non-negative matrix factorization," in 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2018, pp. 716-720.
[11] C. Donahue, et al., "Exploring speech enhancement with generative adversarial
networks for robust speech recognition," in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5024-5028.
[12] S. Pascual, et al., "Language and noise transfer in speech enhancement generative
adversarial network," in 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2018, pp. 5019-5023.
[13] W. Xue, et al., "Modulation-Domain Parametric Multichannel Kalman Filtering for
Speech Enhancement," in 2018 26th European Signal Processing Conference
(EUSIPCO), 2018, pp. 2509-2513.
[14] X. Leng, et al., "On Speech Enhancement Using Microphone Arrays in the Presence
of Co-Directional Interference," in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 511-515.
[15] Y. Bando, et al., "Speech enhancement based on Bayesian low-rank and sparse
decomposition of multichannel magnitude spectrograms," IEEE/ACM Transactions
on Audio, Speech, and Language Processing, vol. 26, pp. 215-230, 2017.
[16] S.China Venkateswarlu,A.karthik”Performance on Speech Enhancement Objective
Quality Measures Using Hybrid Wavelet Thresholding” International Journal of
Engineering and Advanced Technology publisher by Blue Eyes Intelligence
Engineering & Sciencespublication.vol.8,issue 6.pp.3523-3533,2019.

More Related Content

What's hot

dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
Vipul Munot
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
IRJET Journal
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
ijiert bestjournal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...
Alexander Decker
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
tsysglobalsolutions
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
ijnlc
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
vivatechijri
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...
k srikanth
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different Techniques
IRJET Journal
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
csandit
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITIONniranjan kumar
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
inventy
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
Dia Abdulkerim
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
ijnlc
 

What's hot (17)

dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different Techniques
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 

Similar to Performance estimation based recurrent-convolutional encoder decoder for speech enhancement

EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
kevig
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
kevig
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
IJECEIAES
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Cross Model.pptx
Cross Model.pptxCross Model.pptx
Cross Model.pptx
Komal526846
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
IRJET Journal
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
CSCJournals
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
IJMER
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
iosrjce
 
An Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech RecognitionAn Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech Recognition
Projectsatbangalore
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
kevig
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
senthilrajvlsi
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
IRJET Journal
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
IJECEIAES
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
IJECEIAES
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
IDES Editor
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
ijitcs
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet Transforms
CSCJournals
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
IRJET Journal
 

Similar to Performance estimation based recurrent-convolutional encoder decoder for speech enhancement (20)

EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Cross Model.pptx
Cross Model.pptxCross Model.pptx
Cross Model.pptx
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
An Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech RecognitionAn Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech Recognition
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet Transforms
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
 

Recently uploaded

weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 

Recently uploaded (20)

weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 

Performance estimation based recurrent-convolutional encoder decoder for speech enhancement

  • 1. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 772 Performance estimation based recurrent-convolutional encoder decoder for speech enhancement A.Karthik1 , Dr. J.L Mazher Iqbal 2 1 Research scholar, Veltech Rangarajan Dr sagunthala R&D Institute of Science and Technology, Chennai 2 Research Professor, Veltech Rangarajan Dr sagunthala R&D Institute of Science and Technology, Chennai Abstract Speech is the key to our communication skills. As we use recorded speech to communicate remotely with other human beings, we become more and more accustomed to machines that simply "listen to us". The goal of the improvement is to improve the intelligibility and / or the general perceptual quality of the degraded vocal signal through audio signal processing techniques. Speech enhancement with noise reduction or noise reduction is the most important field of speech improvement and is used for many applications such as cell phones, VoIP, teleconferencing systems, voice recognition and hearing aids. Speech enhancement is necessary for many applications where the clean voice signal is important for further processing. Keywords: Speech Recognition, Automatic Speech Recognition (ASR), Recurrent Convolutional Encoder-Decoder (R-CED) network, PESQ, STOI and CER. 1. Introduction Speech enhancement techniques focus primarily on removing noise from a voice signal. The various types of noise and techniques to eliminate these noises. In recent years, learning architectures based on deep neural networks (DNN) they have been very successful in related areas such as speech recognition. The success of deep neural networks (DNN) in automatic speech recognition has led to the study of deep neural networks for ASR noise suppression and speech improvement. The central theme of using DNN to improve speech is that speech noise corruption is a complex process and a complex nonlinear model such as DNN is suitable for modelling it. Although there is very little in-depth work on the usefulness of DNNs for improving speech, it has shown promising results and could outperform classic SE methods .A common aspect in many of these works is an assessment of the conditions of coupled or seen noise. The corresponding or displayed conditions imply that the types of test noise (e.g. ground noise) are the same as for training. Unlike classical methods, motivated by aspects of signal processing, DNN-based methods are data-driven approaches and the corresponding noise conditions may not be ideal for evaluating DNNs for improving speech. . In recent years, learning architectures based on the deep neural network (DNN) have been very successful in related areas such as speech recognition. The success of the deep neural network (DNN) in automatic speech recognition has led to the search for deep neural networks for noise suppression for ASR and speech improvement. The central theme of using DNN to improve speech is that speech noise corruption is a complex process and a complex nonlinear model such as DNN is suitable for modelling it. Although there is very little in-depth work on the usefulness of DNNs for improving speech, it has shown promising results and could outperform classic SE methods. A common aspect in many of these works is an assessment of the conditions of coupled or seen noise. The corresponding or displayed conditions imply that the types of test noise (e.g. ground noise) are the same as for training. Unlike classical methods, motivated by aspects of signal processing, DNN-based methods are data-driven approaches and the corresponding
  • 2. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 773 noise conditions may not be ideal for evaluating DNNs for speech improvement.Speech enhancement (SE) is a serious research problem in audio signal processing. The goal is to improve the quality and intelligibility of voice signals corrupted by noise. Thanks to its application in various sectors, such as automatic voice recognition, mobile communication, hearing aids, etc. 2. Advantages on speech enhancement: Free up cognitive working space Allows the user to operate a computer by speaking to it Eliminates handwriting, spelling problems Always spells correctly (doesn't always recognize words correctly) Allows dictation of text, commands 3. Disadvantages on speech enhancement Assists with one stage of the writing process, not a solution to the writing problem Difficult to use in classroom settings, due to noise interference Requires large amounts of memory to store voice files Makes errors, can be frustrating without adequate support Requires each user to train the software to recognize a voice, hard for poor decoders 4. Application of speech enhancement Speaker identification Automatic speech recognition Biomedical speech recognition Cell phone speech recognition 5. Related work: (Wang and Brookes 2018) presented an algorithm to improve the speech of the modulation domain using the Kalman filter The proposed estimator jointly models the estimated dynamics of the noise and speech spectral amplitudes to obtain an estimate of the mean squared error estimator (MMSE) of the speech amplitude spectrum assuming that noise and language are additive in the compound domain . Understand the dynamics of noise amplitudes with those of speech amplitudes. Therefore, this work proposed the statistical model "Gaussring" which contains a mixture of Gaussians whose centers are in a circle on the complex plane. The performance of the proposed algorithm has been estimated using the STOI measurement (short-term objective intelligibility), the PESQ measure (perceptual assessment of speech quality) and the seg SNR measure (segmental SNR). For measures of speech quality, the proposed algorithm was displayed to provide constant improvement over a wide range of SNR while associated with competitive algorithms. Speech recognition experiments also showed that the Gaussring-based algorithm reaches two types of noise well (Bando, et al. 2018) implemented a semi-supervised speech enhancement techniques known as variation auto encoder–nonnegative matrix factorization (VAE-NMF), which involved A probabilistic model of generative speech based on a VAE and this noise was based on a non-negative matrix factorization. Here, only the vocal model has been pre- trained to use a sufficient amount of clean voice. Using the vocal model as a pre- distribution, it is possible to obtain subsequent estimates of the clean voice by using a Monte Carlo Markov chain (MCMC) sample, familiarizing the noise model with noisy
  • 3. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 774 environments. Experiments confirmed that VAE-NMF outperformed conventional supervised techniques based on deep neural networks in invisible and noisy environments. A next stimulating direction was to extend VAE-NMF to the multichannel scenario. Meanwhile, a VAE and a well-studied linear phase model can mean complicated vocal signals and a spatial mixing process, respectively, would be efficient to integrate these models into a unified probabilistic structure. Also, consider GAN-based training of the voice model to accurately learn a probability distribution of the voice. (Donahue, et al. 2018) introduced the frequency-domain Speech Enhancement Adverse Generative Networks (FSEGAN), a technique based on Adverse Generator Networks (GAN) to perform speech improvement in the frequency domain, and revealed improvements in the performance of Automatic Speech Recognition (ASR) in relation to the previous time domain method. Then, it provided the evidence that was retrained; FSEGAN could progress the performance of previous Multi-Style-Training (MTR)- trained the ASR systems. Experiments have been indicated that for ASR as simpler regression techniques may be preferable to GAN based improvement. It seems that FSEGAN collects plausible spectra and could be more valuable for telephone applications when combined with a representation of invertible characteristics. (Pascual, et al. 2018) He proposed the performance of adapting speech improvement to this generative confrontation network, adjusting the generator with the least amount of data. In order to examine the minimum requirements, stable behaviors was obtained in terms of various objective metrics and two different types of languages: Korean and Catalan. The main objective of the study of the variability of the test performance in relation to invisible noise as a function of the number of different types of noise was available for the training set. Performance was revealed as the adaptation of the pre- trained English model with ten minutes of data. It has already achieved comparable performance by having two orders of magnitude more. In addition, they demonstrated relative stability in the test performed in relation to the number of types of training noise. (Zhao, et al. 2018) they elucidated the EHNET that combined recurrent neural networks and convolutional neural networks to improve speech. EHNET's inductive bias was adequate to address speech improvement. The convolution cores are able to effectively detect local patterns in bidirectional connections and spectrograms; Recurring connections can automatically model dynamic correlations between adjacent frames. Due to the low nature of convolutions, EHNET required fewer calculations than the recurrent neural network and machine learning programming. The performance of the results demonstrated that EHNET consistently outperforms competitors in general in the five different metrics. In addition, it was able to simplify the invisible noise that confirmed the EHNET's effectiveness in improving speech. 6. Challenges to be overcome: In the existing work, The classical techniques guided by the a priori and a posteriori SNR decision become latent variables in the NRN, from which the estimated probability dependent on the frequency of the presence of the speech is used to recursively update the latent variables. , but the difference in recurrent neural networks (RNN) is very unstable if ReLu is used as an activation function. Therefore, it is unable to process very long chains due to the trigger function, RNNs cannot stack in very deep models and RNNs cannot track long-term dependencies.
  • 4. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 775 7. Proposed meet out To improve the accuracy of speech improvement in the RCNN approach of an a priori and a posteriori SNR. To recover the quality of enhanced speech in the speech-present regions, and extend the additive noise framework. To show the efficiency of speech enhancement with the increasing dimension and decreasing dimension is used by the Recurrent Convolutional Encoder-Decoder (R-CED). 8. Proposed method To overcome the above challenges, the speech enhancement is used to find the noise free speech mainly it estimated the priori and posterior SNR. The priori SNR can be understood as the true instantaneous power ratio between each spectral component of clean speech and noise, while the posteriori SNR can be viewed as the instantaneous power ratio between each spectral component of observed noisy speech and noise. In this proposed work, a Recurrent Convolutional Encoder-Decoder (R-CED) network is used. R-CED consists of repetitions of a convolution, batch normalization, and a ReLU activation layer. R-CED encodes the features into higher dimension along the encoder and achieves compression along the decoder. The number of filters is kept symmetric: at the encoder, the number of filters is gradually increased, and at the decoder, the number of filters is gradually decreased. Here initialize the trellis map, design the circuit logic, perform LP norm decoding .Finally decoding. Finally maximum likelihood estimates by traversing the Trellis Map Where prediction of distortion elements. Moreover, the process of decoding it will get the noise free speech then the loss function occurred from the priori SNR. At the loss function, MSE will calculate and compared with the threshold value, if the value is greater than the MSE goes to the R-CED process. If the values are lesser than the MSE, then the speech will enhanced. From this enhanced speech, the performance analyzed as the metrics of SNR (Signal Noise Ratio), SDR (Signal to Distortion Ratio), MSE (Mean Squared Error). Algorithm / Techniques to be used SNR based Recurrent-Convolutional Encoder Decoder (SNR- RCED) Performance metrics: PESQ(Perceptual Evaluation of Speech Quality) STOI(Short-time objective intelligibility) CER(Character Error Rate) MSE (Mean Squared Error) SNR (Signal Noise Ratio) SDR (Signal to Distortion Ratio)
  • 5. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 776 9. Flow of proposed work Figure 1: Flow of the proposed work References [1] H. Zhao, et al., "Convolutional recurrent neural networks for speech enhancement," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 2401-2405. [2] H.-P. Liu, et al., "Bone-conducted speech enhancement using deep denoising autoencoder," Speech Communication, vol. 104, pp. 106-112, 2018. [3] Y. Zhao, et al., "Perceptually guided speech enhancement using deep neural networks," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5074-5078. [4] Q. He, et al., "Multiplicative update of auto-regressive gains for codebook-based speech enhancement," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 25, pp. 457-468, 2017. [5] R. Henni, et al., "A new efficient two-channel fast transversal adaptive filtering algorithm for blind speech enhancement and acoustic noise reduction," Computers & Electrical Engineering, vol. 73, pp. 349-368, 2019. [6] Y. Xia and R. Stern, "A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement," in Interspeech, 2018, pp. 3274-3278. [7] X. Du, et al., "End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking," arXiv preprint arXiv:1901.00295, 2019. [8] R. Bendoumia, "Two-channel forward NLMS algorithm combined with simple variable step-sizes for speech quality enhancement," Analog Integrated Circuits and Signal Processing, vol. 98, pp. 27-40, 2019.
  • 6. International Journal of Advanced Science and Technology Vol. 29, No. 05, (2020), pp. 772-777 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC 777 [9] Y. Wang and M. Brookes, "Model-based speech enhancement in the modulation domain," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 26, pp. 580-594, 2018. [10] Y. Bando, et al., "Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 716-720. [11] C. Donahue, et al., "Exploring speech enhancement with generative adversarial networks for robust speech recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5024-5028. [12] S. Pascual, et al., "Language and noise transfer in speech enhancement generative adversarial network," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5019-5023. [13] W. Xue, et al., "Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement," in 2018 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 2509-2513. [14] X. Leng, et al., "On Speech Enhancement Using Microphone Arrays in the Presence of Co-Directional Interference," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 511-515. [15] Y. Bando, et al., "Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, pp. 215-230, 2017. [16] S.China Venkateswarlu,A.karthik”Performance on Speech Enhancement Objective Quality Measures Using Hybrid Wavelet Thresholding” International Journal of Engineering and Advanced Technology publisher by Blue Eyes Intelligence Engineering & Sciencespublication.vol.8,issue 6.pp.3523-3533,2019.