This document discusses a proposed Recurrent-Convolutional Encoder-Decoder (R-CED) network for speech enhancement. The R-CED network aims to overcome challenges with existing methods by estimating the a priori and posteriori signal-to-noise ratios to separate noise from speech. The R-CED consists of convolutional layers with increasing and decreasing numbers of filters to encode and decode features. Performance will be evaluated using metrics like PESQ, STOI, CER, MSE, SNR, and SDR. The proposed method aims to improve speech enhancement accuracy and recover enhanced speech quality compared to other techniques.
In the present-day communications speech signals get contaminated due to
various sorts of noises that degrade the speech quality and adversely impacts
speech recognition performance. To overcome these issues, a novel approach
for speech enhancement using Modified Wiener filtering is developed and
power spectrum computation is applied for degraded signal to obtain the
noise characteristics from a noisy spectrum. In next phase, MMSE technique
is applied where Gaussian distribution of each signal i.e. original and noisy
signal is analyzed. The Gaussian distribution provides spectrum estimation
and spectral coefficient parameters which can be used for probabilistic model
formulation. Moreover, a-priori-SNR computation is also incorporated for
coefficient updation and noise presence estimation which operates similar to
the conventional VAD. However, conventional VAD scheme is based on the
hard threshold which is not capable to derive satisfactory performance and a
soft-decision based threshold is developed for improving the performance of
speech enhancement. An extensive simulation study is carried out using
MATLAB simulation tool on NOIZEUS speech database and a comparative
study is presented where proposed approach is proved better in comparison
with existing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the present-day communications speech signals get contaminated due to
various sorts of noises that degrade the speech quality and adversely impacts
speech recognition performance. To overcome these issues, a novel approach
for speech enhancement using Modified Wiener filtering is developed and
power spectrum computation is applied for degraded signal to obtain the
noise characteristics from a noisy spectrum. In next phase, MMSE technique
is applied where Gaussian distribution of each signal i.e. original and noisy
signal is analyzed. The Gaussian distribution provides spectrum estimation
and spectral coefficient parameters which can be used for probabilistic model
formulation. Moreover, a-priori-SNR computation is also incorporated for
coefficient updation and noise presence estimation which operates similar to
the conventional VAD. However, conventional VAD scheme is based on the
hard threshold which is not capable to derive satisfactory performance and a
soft-decision based threshold is developed for improving the performance of
speech enhancement. An extensive simulation study is carried out using
MATLAB simulation tool on NOIZEUS speech database and a comparative
study is presented where proposed approach is proved better in comparison
with existing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
dialogue act modeling for automatic tagging and recognitionVipul Munot
Aim to present comprehensive framework
for modelling and automatic classification of DA’s
founded on well-known statistical methods
Present results obtained with this approach
on large widely available corpus of
spontaneous conversational speech.
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
Author has already published one review paper on the quality enhancement of a speech signal by minimizing the noise. This is a second paper of same series. In last two decades the researchers have taken continuous efforts to reduce the noise signal from the speech signal. Th is paper comments on,various study carried out and analysis propos als of the researchers for en hancement of the quality of speech signal. Various models,coding,speech quality improvement methods,speaker dependent codebooks,autocorrelation subtraction,speech restoration,producing speech at low bit rates,compression and enhancement are the vari ous aspects of speech enhancement. We have presented the review of all above mentioned technologies in this paper and also willing to examine few of the techniques in order to analyze the factors affecting them in upcoming paper of the series.
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...ijnlc
Researchers of many nations have developed automatic speech recognition (ASR) to show their national improvement in information and communication technology for their languages. This work intends to improve the ASR performance for Myanmar language by changing different Convolutional Neural Network (CNN) hyperparameters such as number of feature maps and pooling size. CNN has the abilities of reducing in spectral variations and modeling spectral correlations that exist in the signal due to the locality and pooling operation. Therefore, the impact of the hyperparameters on CNN accuracy in ASR tasks is investigated. A 42-hr-data set is used as training data and the ASR performance was evaluated on two open
test sets: web news and recorded data. As Myanmar language is a syllable-timed language, ASR based on syllable was built and compared with ASR based on word. As the result, it gained 16.7% word error rate (WER) and 11.5% syllable error rate (SER) on TestSet1. And it also achieved 21.83% WER and 15.76% SER on TestSet2.
Eat it, Review it: A New Approach for Review Predictionvivatechijri
Deep Learning has achieved significant improvement in various machine learning tasks. Nowadays,
Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been increasing its popularity on
Text Sequence i.e. word prediction. The ability to abstract information from image or text is being widely
adopted by organizations around the world. A basic task in deep learning is classification be it image or text.
Current trending techniques such as RNN, CNN has proven that such techniques open the door for data analysis.
Emerging technologies such has Region CNN, Recurrent CNN have been under consideration for the analysis.
Recurrent CNN is being under development with the current world. The proposed system uses Recurrent Neural
Network for review prediction. Also LSTM is used along with RNN so as to predict long sentences. This system
focuses on context based review prediction and will provide full length sentence. This will help to write a proper
reviews by understanding the context of user.
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
dialogue act modeling for automatic tagging and recognitionVipul Munot
Aim to present comprehensive framework
for modelling and automatic classification of DA’s
founded on well-known statistical methods
Present results obtained with this approach
on large widely available corpus of
spontaneous conversational speech.
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
Author has already published one review paper on the quality enhancement of a speech signal by minimizing the noise. This is a second paper of same series. In last two decades the researchers have taken continuous efforts to reduce the noise signal from the speech signal. Th is paper comments on,various study carried out and analysis propos als of the researchers for en hancement of the quality of speech signal. Various models,coding,speech quality improvement methods,speaker dependent codebooks,autocorrelation subtraction,speech restoration,producing speech at low bit rates,compression and enhancement are the vari ous aspects of speech enhancement. We have presented the review of all above mentioned technologies in this paper and also willing to examine few of the techniques in order to analyze the factors affecting them in upcoming paper of the series.
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...ijnlc
Researchers of many nations have developed automatic speech recognition (ASR) to show their national improvement in information and communication technology for their languages. This work intends to improve the ASR performance for Myanmar language by changing different Convolutional Neural Network (CNN) hyperparameters such as number of feature maps and pooling size. CNN has the abilities of reducing in spectral variations and modeling spectral correlations that exist in the signal due to the locality and pooling operation. Therefore, the impact of the hyperparameters on CNN accuracy in ASR tasks is investigated. A 42-hr-data set is used as training data and the ASR performance was evaluated on two open
test sets: web news and recorded data. As Myanmar language is a syllable-timed language, ASR based on syllable was built and compared with ASR based on word. As the result, it gained 16.7% word error rate (WER) and 11.5% syllable error rate (SER) on TestSet1. And it also achieved 21.83% WER and 15.76% SER on TestSet2.
Eat it, Review it: A New Approach for Review Predictionvivatechijri
Deep Learning has achieved significant improvement in various machine learning tasks. Nowadays,
Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been increasing its popularity on
Text Sequence i.e. word prediction. The ability to abstract information from image or text is being widely
adopted by organizations around the world. A basic task in deep learning is classification be it image or text.
Current trending techniques such as RNN, CNN has proven that such techniques open the door for data analysis.
Emerging technologies such has Region CNN, Recurrent CNN have been under consideration for the analysis.
Recurrent CNN is being under development with the current world. The proposed system uses Recurrent Neural
Network for review prediction. Also LSTM is used along with RNN so as to predict long sentences. This system
focuses on context based review prediction and will provide full length sentence. This will help to write a proper
reviews by understanding the context of user.
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
In this paper a new thresholding based speech enhancement approach is presented, where the threshold is statistically determined by employing the Teager energy operation on the Wavelet Packet (WP) coefficients of noisy speech. The threshold thus obtained is applied on the WP coefficients of the noisy speech by using a hard thresholding function in order to obtain an enhanced speech. Detailed simulations are carried out in the presence of white, car, pink, and babble noises to evaluate the performance of the proposed method. Standard objective measures, spectrogram representations and subjective listening tests show that the proposed method outperforms the existing state-of-the-art thresholding based speech enhancement approaches for noisy speech from high to low levels of SNR.
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Effect of MFCC Based Features for Speech Signal Alignmentskevig
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
Incremental Difference as Feature for LipreadingIDES Editor
This paper represents a method of computing
incremental dif ference f eatures on the basis of scan line
projection and scan converting lines for the lipreading problem
on a set of isolated word utterances. These features are affine
invariants and found to be eff ective in identification of
similarity between utterances by the speaker in spatial domain
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
Speech technology is an emerging technology and automatic speech recognition has made advances in recent years. Many researches has been performed for many foreign and regional languages. But at present the multilingual speech processing technology has been attracting for research purpose. This paper tries to propose a methodology for developing a bilingual speech identification system for Assamese and English language based on artificial neural network.
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
The performance of audio steganography compression system using discreet wavelet transform (DWT) is investigated. Audio steganography coding is the technology of transforming stego-speech into efficiently encoded version that can be decoded in the receiver side to produce a close representation of the initial signal (non compressed). Experimental results prove the efficiency of the used compression technique since the compressed stego-speech are perceptually intelligible and indistinguishable from the equivalent initial signal, while being able to recover the initial stego-speech with slight degradation in the quality .
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Performance estimation based recurrent-convolutional encoder decoder for speech enhancement
1. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
772
Performance estimation based recurrent-convolutional encoder
decoder for speech enhancement
A.Karthik1
, Dr. J.L Mazher Iqbal 2
1
Research scholar, Veltech Rangarajan Dr sagunthala R&D Institute of Science
and Technology, Chennai
2
Research Professor, Veltech Rangarajan Dr sagunthala R&D Institute of Science
and Technology, Chennai
Abstract
Speech is the key to our communication skills. As we use recorded speech to
communicate remotely with other human beings, we become more and more accustomed
to machines that simply "listen to us". The goal of the improvement is to improve the
intelligibility and / or the general perceptual quality of the degraded vocal signal through
audio signal processing techniques. Speech enhancement with noise reduction or noise
reduction is the most important field of speech improvement and is used for many
applications such as cell phones, VoIP, teleconferencing systems, voice recognition and
hearing aids. Speech enhancement is necessary for many applications where the clean
voice signal is important for further processing.
Keywords: Speech Recognition, Automatic Speech Recognition (ASR), Recurrent
Convolutional Encoder-Decoder (R-CED) network, PESQ, STOI and CER.
1. Introduction
Speech enhancement techniques focus primarily on removing noise from a voice signal.
The various types of noise and techniques to eliminate these noises. In recent years,
learning architectures based on deep neural networks (DNN) they have been very
successful in related areas such as speech recognition. The success of deep neural
networks (DNN) in automatic speech recognition has led to the study of deep neural
networks for ASR noise suppression and speech improvement. The central theme of using
DNN to improve speech is that speech noise corruption is a complex process and a
complex nonlinear model such as DNN is suitable for modelling it. Although there is very
little in-depth work on the usefulness of DNNs for improving speech, it has shown
promising results and could outperform classic SE methods .A common aspect in many of
these works is an assessment of the conditions of coupled or seen noise. The
corresponding or displayed conditions imply that the types of test noise (e.g. ground
noise) are the same as for training. Unlike classical methods, motivated by aspects of
signal processing, DNN-based methods are data-driven approaches and the corresponding
noise conditions may not be ideal for evaluating DNNs for improving speech. . In recent
years, learning architectures based on the deep neural network (DNN) have been very
successful in related areas such as speech recognition. The success of the deep neural
network (DNN) in automatic speech recognition has led to the search for deep neural
networks for noise suppression for ASR and speech improvement. The central theme of
using DNN to improve speech is that speech noise corruption is a complex process and a
complex nonlinear model such as DNN is suitable for modelling it. Although there is very
little in-depth work on the usefulness of DNNs for improving speech, it has shown
promising results and could outperform classic SE methods. A common aspect in many of
these works is an assessment of the conditions of coupled or seen noise. The
corresponding or displayed conditions imply that the types of test noise (e.g. ground
noise) are the same as for training. Unlike classical methods, motivated by aspects of
signal processing, DNN-based methods are data-driven approaches and the corresponding
2. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
773
noise conditions may not be ideal for evaluating DNNs for speech improvement.Speech
enhancement (SE) is a serious research problem in audio signal processing. The goal is to
improve the quality and intelligibility of voice signals corrupted by noise. Thanks to its
application in various sectors, such as automatic voice recognition, mobile
communication, hearing aids, etc.
2. Advantages on speech enhancement:
Free up cognitive working space
Allows the user to operate a computer by speaking to it
Eliminates handwriting, spelling problems
Always spells correctly (doesn't always recognize words correctly)
Allows dictation of text, commands
3. Disadvantages on speech enhancement
Assists with one stage of the writing process, not a solution to the writing
problem
Difficult to use in classroom settings, due to noise interference
Requires large amounts of memory to store voice files
Makes errors, can be frustrating without adequate support
Requires each user to train the software to recognize a voice, hard for poor
decoders
4. Application of speech enhancement
Speaker identification
Automatic speech recognition
Biomedical speech recognition
Cell phone speech recognition
5. Related work:
(Wang and Brookes 2018) presented an algorithm to improve the speech of the
modulation domain using the Kalman filter The proposed estimator jointly models the
estimated dynamics of the noise and speech spectral amplitudes to obtain an estimate of
the mean squared error estimator (MMSE) of the speech amplitude spectrum assuming
that noise and language are additive in the compound domain . Understand the dynamics
of noise amplitudes with those of speech amplitudes. Therefore, this work proposed the
statistical model "Gaussring" which contains a mixture of Gaussians whose centers are in
a circle on the complex plane. The performance of the proposed algorithm has been
estimated using the STOI measurement (short-term objective intelligibility), the PESQ
measure (perceptual assessment of speech quality) and the seg SNR measure (segmental
SNR). For measures of speech quality, the proposed algorithm was displayed to provide
constant improvement over a wide range of SNR while associated with competitive
algorithms. Speech recognition experiments also showed that the Gaussring-based
algorithm reaches two types of noise well
(Bando, et al. 2018) implemented a semi-supervised speech enhancement techniques
known as variation auto encoder–nonnegative matrix factorization (VAE-NMF), which
involved A probabilistic model of generative speech based on a VAE and this noise was
based on a non-negative matrix factorization. Here, only the vocal model has been pre-
trained to use a sufficient amount of clean voice. Using the vocal model as a pre-
distribution, it is possible to obtain subsequent estimates of the clean voice by using a
Monte Carlo Markov chain (MCMC) sample, familiarizing the noise model with noisy
3. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
774
environments. Experiments confirmed that VAE-NMF outperformed conventional
supervised techniques based on deep neural networks in invisible and noisy environments.
A next stimulating direction was to extend VAE-NMF to the multichannel scenario.
Meanwhile, a VAE and a well-studied linear phase model can mean complicated vocal
signals and a spatial mixing process, respectively, would be efficient to integrate these
models into a unified probabilistic structure. Also, consider GAN-based training of the
voice model to accurately learn a probability distribution of the voice.
(Donahue, et al. 2018) introduced the frequency-domain Speech Enhancement
Adverse Generative Networks (FSEGAN), a technique based on Adverse Generator
Networks (GAN) to perform speech improvement in the frequency domain, and revealed
improvements in the performance of Automatic Speech Recognition (ASR) in relation to
the previous time domain method. Then, it provided the evidence that was retrained;
FSEGAN could progress the performance of previous Multi-Style-Training (MTR)-
trained the ASR systems. Experiments have been indicated that for ASR as simpler
regression techniques may be preferable to GAN based improvement. It seems that
FSEGAN collects plausible spectra and could be more valuable for telephone applications
when combined with a representation of invertible characteristics.
(Pascual, et al. 2018) He proposed the performance of adapting speech improvement
to this generative confrontation network, adjusting the generator with the least amount of
data. In order to examine the minimum requirements, stable behaviors was obtained
in terms of various objective metrics and two different types of languages: Korean and
Catalan. The main objective of the study of the variability of the test performance in
relation to invisible noise as a function of the number of different types of noise was
available for the training set. Performance was revealed as the adaptation of the pre-
trained English model with ten minutes of data. It has already achieved comparable
performance by having two orders of magnitude more. In addition, they demonstrated
relative stability in the test performed in relation to the number of types of training noise.
(Zhao, et al. 2018) they elucidated the EHNET that combined recurrent neural
networks and convolutional neural networks to improve speech. EHNET's inductive bias
was adequate to address speech improvement. The convolution cores are able to
effectively detect local patterns in bidirectional connections and spectrograms; Recurring
connections can automatically model dynamic correlations between adjacent frames. Due
to the low nature of convolutions, EHNET required fewer calculations than the recurrent
neural network and machine learning programming. The performance of the results
demonstrated that EHNET consistently outperforms competitors in general in the five
different metrics. In addition, it was able to simplify the invisible noise that confirmed the
EHNET's effectiveness in improving speech.
6. Challenges to be overcome:
In the existing work, The classical techniques guided by the a priori and a posteriori
SNR decision become latent variables in the NRN, from which the estimated probability
dependent on the frequency of the presence of the speech is used to recursively update the
latent variables. , but the difference in recurrent neural networks (RNN) is very unstable if
ReLu is used as an activation function. Therefore, it is unable to process very long chains
due to the trigger function, RNNs cannot stack in very deep models and RNNs cannot
track long-term dependencies.
4. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
775
7. Proposed meet out
To improve the accuracy of speech improvement in the RCNN approach of an a
priori and a posteriori SNR.
To recover the quality of enhanced speech in the speech-present regions, and
extend the additive noise framework.
To show the efficiency of speech enhancement with the increasing dimension and
decreasing dimension is used by the Recurrent Convolutional Encoder-Decoder
(R-CED).
8. Proposed method
To overcome the above challenges, the speech enhancement is used to find the noise
free speech mainly it estimated the priori and posterior SNR. The priori SNR can be
understood as the true instantaneous power ratio between each spectral component of
clean speech and noise, while the posteriori SNR can be viewed as the instantaneous
power ratio between each spectral component of observed noisy speech and noise. In this
proposed work, a Recurrent Convolutional Encoder-Decoder (R-CED) network is used.
R-CED consists of repetitions of a convolution, batch normalization, and a ReLU
activation layer. R-CED encodes the features into higher dimension along the encoder and
achieves compression along the decoder. The number of filters is kept symmetric: at the
encoder, the number of filters is gradually increased, and at the decoder, the number of
filters is gradually decreased. Here initialize the trellis map, design the circuit logic,
perform LP norm decoding .Finally decoding. Finally maximum likelihood estimates by
traversing the Trellis Map Where prediction of distortion elements. Moreover, the process
of decoding it will get the noise free speech then the loss function occurred from the priori
SNR. At the loss function, MSE will calculate and compared with the threshold value, if
the value is greater than the MSE goes to the R-CED process. If the values are lesser than
the MSE, then the speech will enhanced. From this enhanced speech, the performance
analyzed as the metrics of SNR (Signal Noise Ratio), SDR (Signal to Distortion Ratio),
MSE (Mean Squared Error).
Algorithm / Techniques to be used
SNR based Recurrent-Convolutional Encoder Decoder (SNR- RCED)
Performance metrics:
PESQ(Perceptual Evaluation of Speech Quality)
STOI(Short-time objective intelligibility)
CER(Character Error Rate)
MSE (Mean Squared Error)
SNR (Signal Noise Ratio)
SDR (Signal to Distortion Ratio)
5. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
776
9. Flow of proposed work
Figure 1: Flow of the proposed work
References
[1] H. Zhao, et al., "Convolutional recurrent neural networks for speech enhancement,"
in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2018, pp. 2401-2405.
[2] H.-P. Liu, et al., "Bone-conducted speech enhancement using deep denoising
autoencoder," Speech Communication, vol. 104, pp. 106-112, 2018.
[3] Y. Zhao, et al., "Perceptually guided speech enhancement using deep neural
networks," in 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2018, pp. 5074-5078.
[4] Q. He, et al., "Multiplicative update of auto-regressive gains for codebook-based
speech enhancement," IEEE/ACM Transactions on Audio, Speech and Language
Processing (TASLP), vol. 25, pp. 457-468, 2017.
[5] R. Henni, et al., "A new efficient two-channel fast transversal adaptive filtering
algorithm for blind speech enhancement and acoustic noise reduction," Computers &
Electrical Engineering, vol. 73, pp. 349-368, 2019.
[6] Y. Xia and R. Stern, "A Priori SNR Estimation Based on a Recurrent Neural
Network for Robust Speech Enhancement," in Interspeech, 2018, pp. 3274-3278.
[7] X. Du, et al., "End-to-End Model for Speech Enhancement by Consistent
Spectrogram Masking," arXiv preprint arXiv:1901.00295, 2019.
[8] R. Bendoumia, "Two-channel forward NLMS algorithm combined with simple
variable step-sizes for speech quality enhancement," Analog Integrated Circuits and
Signal Processing, vol. 98, pp. 27-40, 2019.
6. International Journal of Advanced Science and Technology
Vol. 29, No. 05, (2020), pp. 772-777
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC
777
[9] Y. Wang and M. Brookes, "Model-based speech enhancement in the modulation
domain," IEEE/ACM Transactions on Audio, Speech and Language Processing
(TASLP), vol. 26, pp. 580-594, 2018.
[10] Y. Bando, et al., "Statistical speech enhancement based on probabilistic integration
of variational autoencoder and non-negative matrix factorization," in 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2018, pp. 716-720.
[11] C. Donahue, et al., "Exploring speech enhancement with generative adversarial
networks for robust speech recognition," in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5024-5028.
[12] S. Pascual, et al., "Language and noise transfer in speech enhancement generative
adversarial network," in 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2018, pp. 5019-5023.
[13] W. Xue, et al., "Modulation-Domain Parametric Multichannel Kalman Filtering for
Speech Enhancement," in 2018 26th European Signal Processing Conference
(EUSIPCO), 2018, pp. 2509-2513.
[14] X. Leng, et al., "On Speech Enhancement Using Microphone Arrays in the Presence
of Co-Directional Interference," in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 511-515.
[15] Y. Bando, et al., "Speech enhancement based on Bayesian low-rank and sparse
decomposition of multichannel magnitude spectrograms," IEEE/ACM Transactions
on Audio, Speech, and Language Processing, vol. 26, pp. 215-230, 2017.
[16] S.China Venkateswarlu,A.karthik”Performance on Speech Enhancement Objective
Quality Measures Using Hybrid Wavelet Thresholding” International Journal of
Engineering and Advanced Technology publisher by Blue Eyes Intelligence
Engineering & Sciencespublication.vol.8,issue 6.pp.3523-3533,2019.