SlideShare a Scribd company logo
1 of 10
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
IEEEACM Transactions on Audio, Speech, and Language
Processing
Mel-Cepstrum-Based Quantization Noise Shaping Applied to Neural-
Network-Based Speech Waveform Synthesis
ABSTRACT
This paper presents a mel-cepstrum-based quantization noise shaping method for
improving the quality of synthetic speech generated by neural-network-based speech waveform
synthesis systems. Since mel-cepstral coefficients closely match the characteristics of human
auditory perception, the proposed method effectively masks the white noise introduced by the
quantization typically used in neural-network-based speech waveform synthesis systems. The
paper also describes a computationally efficient implementation of the proposed method using
the structure of the mel-log spectrum approximation filter. Experiments using the WaveNet
generative model, which is a state-of-theart model for neural-network-based speech waveform
synthesis, showed that speech quality is significantly improved by the proposed method.
A Multi-Objective Learning and Ensembling Approach to High-Performance
Speech Enhancement with Compact Neural Network Architectures
ABSTRACT
In this study, we propose a novel deep neural network (DNN) architecture for speech
enhancement (SE) via a multi-objective learning and ensembling (MOLE) framework to achieve
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
a compact and low-latency design while maintaining good performance in quality evaluations.
MOLE follows the boosting concept when combining weak models into a strong classifier and
consists of two compact deep neural networks (DNNs). The first, called the multi-objective
learning DNN (MOLDNN), takes multiple features, such as log-power spectra (LPS), mel-
frequency cepstral coefficients (MFCCs) and Gammatone frequency cepstral coefficients
(GFCCs) to predict a multiobjective set that includes clean speech feature, dynamic noise feature
and ideal ratio mask (IRM). The second, called the multi-objective ensembling DNN (MOE-
DNN), takes the learned features from MOL-DNN as inputs and separately predicts clean LPS
and IRM, clean MFCC and IRM and clean GFCC and IRM using three sets of weak regression
functions. Finally, a post-processing operation can be applied to the estimated clean features by
leveraging the multiple targets learned from both the MOL-DNN and the MOE-DNN. On speech
corrupted by 15 noise types not seen in model training the speech enhancement results show that
the MOLE approach, which features a small model size and low run-time latency, can achieve
consistent improvements over both DNN- and long short-term memory (LSTM)-based
techniques in terms of all the objective metrics evaluated in this study for all three cases (the
input contexts contain 1-frame, 4-frame and 7-frame instances). The 1-frame MOLE-based SE
system outperforms the DNN-based SE system with a 7-frame input expansion at a 3-frame
delay and also achieves better performance than the LSTM-based SE system with 4-frame, no
delay expansion by including only 3 previous frames, and with 170 times less processing
latency.
Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional
Recurrent Neural Networks
ABSTRACT
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN
with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate
classifiers for confidence estimation in automatic speech recognition. At the same time, we have
recently shown that speaker adaptation of confidence measures using DBLSTM yields
significant improvements over non-adapted confidence measures. In accordance with these two
recent contributions to the state of the art in confidence estimation, this paper presents a
comprehensive study of speaker-adapted confidence measures using DBRNN and DBLSTM
models. Firstly, we present new empirical evidences of the superiority of RNN-based confidence
classifiers evaluated over a large speech corpus consisting of the English LibriSpeech and the
Spanish poliMedia tasks. Secondly, we show new results on speaker-adapted confidence
measures considering a multi-task framework in which RNN-based confidence classifiers trained
with LibriSpeech are adapted to speakers of the TED-LIUM corpus. These experiments confirm
that speaker-adapted confidence measures outperform their non-adapted counterparts. Lastly, we
describe an unsupervised adaptation method of the acoustic DBLSTM model based on
confidence measures which results in better automatic speech recognition performance.
Mispronunciation Detection in Children’s Reading of Sentences
ABSTRACT
This work proposes an approach to automatically parse children’s reading of sentences by
detecting word pronunciations and extra content, and to classify words as correctly or incorrectly
pronounced. This approach can be directly helpful for automatic assessment of reading level or
for automatic reading tutors, where a correct reading must be identified. We propose a first
segmentation stage to locate candidate word pronunciations based on allowing repetitions and
false starts of a word’s syllables. A decoding grammar based solely on syllables allows silence to
appear during a word pronunciation. At a second stage, word candidates are classified as
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
mispronounced or not. The feature that best classifies mispronunciations is found to be the log-
likelihood ratio between a free phone loop and a word spotting model in the very close vicinity
of the candidate segmentation. Additional features are combined in multi-feature models to
further improve classification, including: normalizations of the log-likelihood ratio, derivations
from phone likelihoods, and Levenshtein distances between the correct pronunciation and
recognized phonemes through two phoneme recognition approaches. Results show that most
extra events were detected (close to 2% word error rate achieved) and that using automatic
segmentation for mispronunciation classification approaches the performance of manual
segmentation. Although the log-likelihood ratio from a spotting approach is already a good
metric to classify word pronunciations, the combination of additional features provides a relative
reduction of the miss rate of 18% (from 34.03% to 27.79% using manual segmentation and from
35.58% to 29.35% using automatic segmentation, at constant 5% false alarm rate).
Analysis of the Reconstruction of Sparse Signals in the DCT Domain Applied
to Audio Signals
ABSTRACT
Sparse signals can be reconstructed from a reduced set of signal samples using
compressive sensing (CS) methods. The discrete cosine transform (DCT) can provide highly
concentrated representations of audio signals. This property implies the DCT as a good sparsity
domain for the audio signals. In this paper, the DCT is studied within the context of sparse audio
signal processing using the CS theory and methods. The DCT coefficients of a sparse signal,
calculated with a reduced set of available samples, can be modeled as random variables. It has
been shown that the statistical properties of these variables are closely related to the unique
reconstruction conditions. The main result of the paper is in an exact formula for the mean
square reconstruction error in the case of approximately sparse and nonsparse noisy signals,
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
reconstructed under the sparsity assumption. Based on the presented analysis, a simple and
computationally efficient reconstruction algorithm is proposed. The presented theoretical
concepts and the efficiency of the reconstruction algorithm are verified numerically, including
examples with synthetic and recorded audio signals with unavailable or corrupted samples.
Random disturbances and disturbances simulating clicks or inpainting in audio signals are
considered. Statistical verification is done on a dataset with experimental signals. Results are
compared with some classical and recent methods used in similar signal and disturbance
scenarios.
Speech Dereverberation with Context aware Recurrent Neural Networks
ABSTRACT
In this paper, we propose a model to perform speech dereverberation by estimating its
spectral magnitude from the reverberant counterpart. Our models are capable of extracting
features that take into account both short and long-term dependencies in the signal through a
convolutional encoder (which extracts features from a short, bounded context of frames) and a
recurrent neural network for extracting long-term information. Our model outperforms a recently
proposed model that uses different context information depending on the reverberation time,
without requiring any sort of additional input, yielding improvements of up to 0.4 on PESQ, 0.3
on STOI, and 1.0 on POLQA relative to reverberant speech. We also show our model is able to
generalize to real room impulse responses even when only trained with simulated room impulse
responses, different speakers, and high reverberation times. Lastly, listening tests show the
proposed method outperforming benchmark models in reduction of perceived reverberation.
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
Do we need individual head-related transfer functions for vertical
localization? The case study of a spectral notch distance metric
ABSTRACT
This paper deals with the issue of individualizing the head-related transfer function
(HRTF) rendering process for auditory elevation perception: is it possible to find a
nonindividual, personalized HRTF set that allows a listener to have an equally accurate
localization performance than with his/her individual HRTFs? We propose a psychoacoustically
motivated, anthropometry based mismatch function between HRTF pairs, that exploits the close
relation between the listener’s pinna geometry and localization cues. This is evaluated using an
auditory model that computes a mapping between HRTF spectra and perceived spatial locations.
Results on a large number of subjects in the CIPIC and ARI HRTF databases suggest that there
exists a non-individual HRTF set which allows a listener to have an equally accurate vertical
localization than with individual HRTFs. Furthermore, we find the optimal parametrization of
the proposed mismatch function, i.e. the one that best reflects the information given by the
auditory model. Our findings show that the selection procedure yields statistically significant
improvements with respect to dummy-head HRTFs or random HRTF selection, with potentially
high impact from an applicative point of view.
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
Interaural Coherence Preservation for Binaural Noise Reduction Using
Partial Noise Estimation and Spectral Postfiltering
ABSTRACT
The objective of binaural speech enhancement algorithms is to reduce the undesired noise
component, while preserving the desired speech source and the binaural cues of all sound
sources. For the scenario of a single desired speech source in a diffuse noise field, an extension
of the binaural multi-channel Wiener filter (MWF), namely the MWF-IC, has been recently
proposed, which aims to preserve the interaural coherence (IC) of the noise component.
However, due to the large complexity of the MWF-IC, in this paper we propose several
alternative algorithms at a lower computational complexity. First, we consider a
quasidistortionless version of the MWF-IC, denoted as MVDR-IC. Secondly, we propose to
preserve the IC of the noise component using the binaural MWF with partial noise estimation
(MWFN) and the binaural minimum-variance-distortionless response beamformer with partial
noise estimation (MVDR-N), for which closed-form expressions exist. In addition, we show that
for the MVDR-N a closed-form expression can be derived for the tradeoff parameter yielding a
desired magnitude squared coherence (MSC) for the output noise component. Since contrary to
the MWF-IC and the MWF-N the MVDR-IC and the MVDR-N do not take into account the
spectro-temporal properties of the speech and the noise components, we propose to apply a
spectral postfilter to the filter outputs, improving the noise reduction performance. The
performance of all algorithms is compared in several diffuse noise scenarios. The simulation
results show that both the MVDR-IC and the MVDR-N are able to preserve the MSC of the
noise component, while generally the MVDRIC shows a slightly better noise reduction
performance at a larger complexity. Further simulation results show that applying a spectral
postfilter leads to a very similar performance for all considered algorithms in terms of noise
reduction and speech distortion.
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
Gating Neural Network for Large Vocabulary Audiovisual Speech
Recognition
ABSTRACT
Audio-based automatic speech recognition (A-ASR) systems are affected by noisy
conditions in real-world applications. Adding visual cues to the ASR system is an appealing
alternative to improve the robustness of the system, replicating the audiovisual perception
process used during human interactions. A common problem observed when using audiovisual
automatic speech recognition (AV-ASR) is the drop in performance when speech is clean. In this
case, visual features may not provide complementary information, introducing variability that
negatively affects the performance of the system. The experimental evaluation in this study
clearly demonstrates this problem when we train an audiovisual state-of-the-art hybrid system
with a deep neural network (DNN) and hidden Markov models (HMMs). This study proposes a
framework that addresses this problem, improving, or at least, maintaining the performance
when visual features are used. The proposed approach is a deep learning solution with a gating
layer that diminishes the effect of noisy or uninformative visual features, keeping only useful
information. The framework is implemented with a subset of the audiovisual CRSS-4ENGLISH-
14 corpus which consists of 61 hours of speech from 105 subjects simultaneously collected with
multiple cameras and microphones. The proposed framework is compared with conventional
HMMs with observation models implemented with either a Gaussian mixture model (GMM) or
DNNs. We also compare the system with a multi-stream hidden Markov model (MS-HMM)
system. The experimental evaluation indicates that the proposed framework outperforms
alternative methods under all configurations, showing the robustness of the gating-based
framework for AV-ASR.
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
Bias-Compensated Informed Sound Source Localization Using Relative
Transfer Functions
ABSTRACT
In this paper, we consider the problem of estimating the target sound direction of arrival
(DoA) for a hearing aid (HA) system, which can connect to a wireless microphone worn by the
talker of interest. The wireless microphone “informs” the HA system about the noise-free target
speech. To estimate the DoA, we consider a maximum-likelihood approach, and we assume that
a database of DoA-dependent relative transfer functions (RTFs) has been measured in advance
and is available. The proposed DoA estimator is able to take the available noise-free target
speech, ambient noise characteristics, and the shadowing effect of the user’s head on the received
signals into account, and it supports bothmonaural and binaural microphone array configurations.
Moreover, we analytically analyze the bias in the proposed estimator and introduce a modified
estimator, which has been compensated for the bias. We demonstrate that the proposed method
has lower computational complexity and better performance than recent RTF-based estimators.
Furthermore, to decrease the number of parameters required to be wirelessly exchanged between
the HAs in binaural configurations, we propose an information fusion strategy, which avoids
transmitting microphone signals between the HAs. An important benefit of the proposed IF
strategy is that the number of parameters to be exchanged between the HAs is independent of the
number of HA microphones. Finally, we investigate the performance of variants of the proposed
estimator extensively in different noisy and reverberant situations.
CONTACT: TSYS Center for Research and Development
(TSYS Academic Projects)
For Details, Contact TSYS Academic Projects in Adyar.
Ph: 9841103123, 044-42607879
Website: http://www.tsysglobalsolutions.com/
Mail Id: tsysglobalsolutions2014@gmail.com.
NO: 20/9, 4th Floor, Janaki Complex, Sardar Patel Road,
Adyar, Chennai-600020.
LANDMARK: Above METRO shoes
Visit us: http://www.tsysglobalsolutions.com/
Email: tsysglobalsolutions2014@gmail.com
Tel: 04442607879, +91 98411 03123.

More Related Content

What's hot

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...ijnlc
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingIJERA Editor
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...IJNSA Journal
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]威華 王
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
 
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONSEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONcscpconf
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
 

What's hot (19)

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
 
2224d_final
2224d_final2224d_final
2224d_final
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONSEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
 

Similar to Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language processing

MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingCSCJournals
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
 
Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...IAESIJAI
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Journals
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...ijistjournal
 

Similar to Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language processing (20)

MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
D111823
D111823D111823
D111823
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...
PERFORMANCE ANALYSIS OF BARKER CODE BASED ON THEIR CORRELATION PROPERTY IN MU...
 

Recently uploaded

ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language processing

  • 1. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. IEEEACM Transactions on Audio, Speech, and Language Processing Mel-Cepstrum-Based Quantization Noise Shaping Applied to Neural- Network-Based Speech Waveform Synthesis ABSTRACT This paper presents a mel-cepstrum-based quantization noise shaping method for improving the quality of synthetic speech generated by neural-network-based speech waveform synthesis systems. Since mel-cepstral coefficients closely match the characteristics of human auditory perception, the proposed method effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems. The paper also describes a computationally efficient implementation of the proposed method using the structure of the mel-log spectrum approximation filter. Experiments using the WaveNet generative model, which is a state-of-theart model for neural-network-based speech waveform synthesis, showed that speech quality is significantly improved by the proposed method. A Multi-Objective Learning and Ensembling Approach to High-Performance Speech Enhancement with Compact Neural Network Architectures ABSTRACT In this study, we propose a novel deep neural network (DNN) architecture for speech enhancement (SE) via a multi-objective learning and ensembling (MOLE) framework to achieve
  • 2. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. a compact and low-latency design while maintaining good performance in quality evaluations. MOLE follows the boosting concept when combining weak models into a strong classifier and consists of two compact deep neural networks (DNNs). The first, called the multi-objective learning DNN (MOLDNN), takes multiple features, such as log-power spectra (LPS), mel- frequency cepstral coefficients (MFCCs) and Gammatone frequency cepstral coefficients (GFCCs) to predict a multiobjective set that includes clean speech feature, dynamic noise feature and ideal ratio mask (IRM). The second, called the multi-objective ensembling DNN (MOE- DNN), takes the learned features from MOL-DNN as inputs and separately predicts clean LPS and IRM, clean MFCC and IRM and clean GFCC and IRM using three sets of weak regression functions. Finally, a post-processing operation can be applied to the estimated clean features by leveraging the multiple targets learned from both the MOL-DNN and the MOE-DNN. On speech corrupted by 15 noise types not seen in model training the speech enhancement results show that the MOLE approach, which features a small model size and low run-time latency, can achieve consistent improvements over both DNN- and long short-term memory (LSTM)-based techniques in terms of all the objective metrics evaluated in this study for all three cases (the input contexts contain 1-frame, 4-frame and 7-frame instances). The 1-frame MOLE-based SE system outperforms the DNN-based SE system with a 7-frame input expansion at a 3-frame delay and also achieves better performance than the LSTM-based SE system with 4-frame, no delay expansion by including only 3 previous frames, and with 170 times less processing latency. Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks ABSTRACT
  • 3. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate classifiers for confidence estimation in automatic speech recognition. At the same time, we have recently shown that speaker adaptation of confidence measures using DBLSTM yields significant improvements over non-adapted confidence measures. In accordance with these two recent contributions to the state of the art in confidence estimation, this paper presents a comprehensive study of speaker-adapted confidence measures using DBRNN and DBLSTM models. Firstly, we present new empirical evidences of the superiority of RNN-based confidence classifiers evaluated over a large speech corpus consisting of the English LibriSpeech and the Spanish poliMedia tasks. Secondly, we show new results on speaker-adapted confidence measures considering a multi-task framework in which RNN-based confidence classifiers trained with LibriSpeech are adapted to speakers of the TED-LIUM corpus. These experiments confirm that speaker-adapted confidence measures outperform their non-adapted counterparts. Lastly, we describe an unsupervised adaptation method of the acoustic DBLSTM model based on confidence measures which results in better automatic speech recognition performance. Mispronunciation Detection in Children’s Reading of Sentences ABSTRACT This work proposes an approach to automatically parse children’s reading of sentences by detecting word pronunciations and extra content, and to classify words as correctly or incorrectly pronounced. This approach can be directly helpful for automatic assessment of reading level or for automatic reading tutors, where a correct reading must be identified. We propose a first segmentation stage to locate candidate word pronunciations based on allowing repetitions and false starts of a word’s syllables. A decoding grammar based solely on syllables allows silence to appear during a word pronunciation. At a second stage, word candidates are classified as
  • 4. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. mispronounced or not. The feature that best classifies mispronunciations is found to be the log- likelihood ratio between a free phone loop and a word spotting model in the very close vicinity of the candidate segmentation. Additional features are combined in multi-feature models to further improve classification, including: normalizations of the log-likelihood ratio, derivations from phone likelihoods, and Levenshtein distances between the correct pronunciation and recognized phonemes through two phoneme recognition approaches. Results show that most extra events were detected (close to 2% word error rate achieved) and that using automatic segmentation for mispronunciation classification approaches the performance of manual segmentation. Although the log-likelihood ratio from a spotting approach is already a good metric to classify word pronunciations, the combination of additional features provides a relative reduction of the miss rate of 18% (from 34.03% to 27.79% using manual segmentation and from 35.58% to 29.35% using automatic segmentation, at constant 5% false alarm rate). Analysis of the Reconstruction of Sparse Signals in the DCT Domain Applied to Audio Signals ABSTRACT Sparse signals can be reconstructed from a reduced set of signal samples using compressive sensing (CS) methods. The discrete cosine transform (DCT) can provide highly concentrated representations of audio signals. This property implies the DCT as a good sparsity domain for the audio signals. In this paper, the DCT is studied within the context of sparse audio signal processing using the CS theory and methods. The DCT coefficients of a sparse signal, calculated with a reduced set of available samples, can be modeled as random variables. It has been shown that the statistical properties of these variables are closely related to the unique reconstruction conditions. The main result of the paper is in an exact formula for the mean square reconstruction error in the case of approximately sparse and nonsparse noisy signals,
  • 5. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. reconstructed under the sparsity assumption. Based on the presented analysis, a simple and computationally efficient reconstruction algorithm is proposed. The presented theoretical concepts and the efficiency of the reconstruction algorithm are verified numerically, including examples with synthetic and recorded audio signals with unavailable or corrupted samples. Random disturbances and disturbances simulating clicks or inpainting in audio signals are considered. Statistical verification is done on a dataset with experimental signals. Results are compared with some classical and recent methods used in similar signal and disturbance scenarios. Speech Dereverberation with Context aware Recurrent Neural Networks ABSTRACT In this paper, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder (which extracts features from a short, bounded context of frames) and a recurrent neural network for extracting long-term information. Our model outperforms a recently proposed model that uses different context information depending on the reverberation time, without requiring any sort of additional input, yielding improvements of up to 0.4 on PESQ, 0.3 on STOI, and 1.0 on POLQA relative to reverberant speech. We also show our model is able to generalize to real room impulse responses even when only trained with simulated room impulse responses, different speakers, and high reverberation times. Lastly, listening tests show the proposed method outperforming benchmark models in reduction of perceived reverberation.
  • 6. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. Do we need individual head-related transfer functions for vertical localization? The case study of a spectral notch distance metric ABSTRACT This paper deals with the issue of individualizing the head-related transfer function (HRTF) rendering process for auditory elevation perception: is it possible to find a nonindividual, personalized HRTF set that allows a listener to have an equally accurate localization performance than with his/her individual HRTFs? We propose a psychoacoustically motivated, anthropometry based mismatch function between HRTF pairs, that exploits the close relation between the listener’s pinna geometry and localization cues. This is evaluated using an auditory model that computes a mapping between HRTF spectra and perceived spatial locations. Results on a large number of subjects in the CIPIC and ARI HRTF databases suggest that there exists a non-individual HRTF set which allows a listener to have an equally accurate vertical localization than with individual HRTFs. Furthermore, we find the optimal parametrization of the proposed mismatch function, i.e. the one that best reflects the information given by the auditory model. Our findings show that the selection procedure yields statistically significant improvements with respect to dummy-head HRTFs or random HRTF selection, with potentially high impact from an applicative point of view.
  • 7. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. Interaural Coherence Preservation for Binaural Noise Reduction Using Partial Noise Estimation and Spectral Postfiltering ABSTRACT The objective of binaural speech enhancement algorithms is to reduce the undesired noise component, while preserving the desired speech source and the binaural cues of all sound sources. For the scenario of a single desired speech source in a diffuse noise field, an extension of the binaural multi-channel Wiener filter (MWF), namely the MWF-IC, has been recently proposed, which aims to preserve the interaural coherence (IC) of the noise component. However, due to the large complexity of the MWF-IC, in this paper we propose several alternative algorithms at a lower computational complexity. First, we consider a quasidistortionless version of the MWF-IC, denoted as MVDR-IC. Secondly, we propose to preserve the IC of the noise component using the binaural MWF with partial noise estimation (MWFN) and the binaural minimum-variance-distortionless response beamformer with partial noise estimation (MVDR-N), for which closed-form expressions exist. In addition, we show that for the MVDR-N a closed-form expression can be derived for the tradeoff parameter yielding a desired magnitude squared coherence (MSC) for the output noise component. Since contrary to the MWF-IC and the MWF-N the MVDR-IC and the MVDR-N do not take into account the spectro-temporal properties of the speech and the noise components, we propose to apply a spectral postfilter to the filter outputs, improving the noise reduction performance. The performance of all algorithms is compared in several diffuse noise scenarios. The simulation results show that both the MVDR-IC and the MVDR-N are able to preserve the MSC of the noise component, while generally the MVDRIC shows a slightly better noise reduction performance at a larger complexity. Further simulation results show that applying a spectral postfilter leads to a very similar performance for all considered algorithms in terms of noise reduction and speech distortion.
  • 8. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition ABSTRACT Audio-based automatic speech recognition (A-ASR) systems are affected by noisy conditions in real-world applications. Adding visual cues to the ASR system is an appealing alternative to improve the robustness of the system, replicating the audiovisual perception process used during human interactions. A common problem observed when using audiovisual automatic speech recognition (AV-ASR) is the drop in performance when speech is clean. In this case, visual features may not provide complementary information, introducing variability that negatively affects the performance of the system. The experimental evaluation in this study clearly demonstrates this problem when we train an audiovisual state-of-the-art hybrid system with a deep neural network (DNN) and hidden Markov models (HMMs). This study proposes a framework that addresses this problem, improving, or at least, maintaining the performance when visual features are used. The proposed approach is a deep learning solution with a gating layer that diminishes the effect of noisy or uninformative visual features, keeping only useful information. The framework is implemented with a subset of the audiovisual CRSS-4ENGLISH- 14 corpus which consists of 61 hours of speech from 105 subjects simultaneously collected with multiple cameras and microphones. The proposed framework is compared with conventional HMMs with observation models implemented with either a Gaussian mixture model (GMM) or DNNs. We also compare the system with a multi-stream hidden Markov model (MS-HMM) system. The experimental evaluation indicates that the proposed framework outperforms alternative methods under all configurations, showing the robustness of the gating-based framework for AV-ASR.
  • 9. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. Bias-Compensated Informed Sound Source Localization Using Relative Transfer Functions ABSTRACT In this paper, we consider the problem of estimating the target sound direction of arrival (DoA) for a hearing aid (HA) system, which can connect to a wireless microphone worn by the talker of interest. The wireless microphone “informs” the HA system about the noise-free target speech. To estimate the DoA, we consider a maximum-likelihood approach, and we assume that a database of DoA-dependent relative transfer functions (RTFs) has been measured in advance and is available. The proposed DoA estimator is able to take the available noise-free target speech, ambient noise characteristics, and the shadowing effect of the user’s head on the received signals into account, and it supports bothmonaural and binaural microphone array configurations. Moreover, we analytically analyze the bias in the proposed estimator and introduce a modified estimator, which has been compensated for the bias. We demonstrate that the proposed method has lower computational complexity and better performance than recent RTF-based estimators. Furthermore, to decrease the number of parameters required to be wirelessly exchanged between the HAs in binaural configurations, we propose an information fusion strategy, which avoids transmitting microphone signals between the HAs. An important benefit of the proposed IF strategy is that the number of parameters to be exchanged between the HAs is independent of the number of HA microphones. Finally, we investigate the performance of variants of the proposed estimator extensively in different noisy and reverberant situations. CONTACT: TSYS Center for Research and Development (TSYS Academic Projects)
  • 10. For Details, Contact TSYS Academic Projects in Adyar. Ph: 9841103123, 044-42607879 Website: http://www.tsysglobalsolutions.com/ Mail Id: tsysglobalsolutions2014@gmail.com. NO: 20/9, 4th Floor, Janaki Complex, Sardar Patel Road, Adyar, Chennai-600020. LANDMARK: Above METRO shoes Visit us: http://www.tsysglobalsolutions.com/ Email: tsysglobalsolutions2014@gmail.com Tel: 04442607879, +91 98411 03123.