A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
Usually, hearing impaired people use hearing aids which are implemented with speech enhancement
algorithms. Estimation of speech and estimation of nose are the components in single channel speech
enhancement system. The main objective of any speech enhancement algorithm is estimation of noise power
spectrum for non stationary environment. VAD (Voice Activity Detector) is used to identify speech pauses
and during these pauses only estimation of noise. MMSE (Minimum Mean Square Error) speech
enhancement algorithm did not enhance the intelligibility, quality and listener fatigues are the perceptual
aspects of speech. Novel evaluation approach SR (Signal to Residual spectrum ratio) based on uncertainty
parameter introduced for the benefits of hearing impaired people in non stationary environments to control
distortions. By estimation and updating of noise based on division of original pure signal into three parts
such as pure speech, quasi speech and non speech frames based on multiple threshold conditions. Different
values of SR and LLR demonstrate the amount of attenuation and amplification distortions. The proposed
method will compared with any one method WAT(Weighted Average Technique) Hence by using
parameters SR (signal to residual spectrum ratio) and LLR (log like hood ratio), MMSE (Minim Mean
Square Error) in terms of segmented SNR and LLR.
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...ijsrd.com
Speech Enhancement by suppressing uncorrelated acoustically added noise has been a challenging topic of research for many years. These are the primary choice for real time applications due to the simplicity and comparatively low computational load. This paper shows VAD (Voice activity detection) technique that can detect the non speech segment from the speech signal. It is also shown that it can work powerfully in an unpredictable noise ambience. The technique is mostly done in microprocessors or DSP processors because of their flexibility. But there are several advantages of FPGA over DSP processors like high cost per logic element related to these processors makes them improper for large scale use. From the experimental results, VAD method is implemented on the FPGA chip.
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionIOSRJVSP
This paper is aimed to reduce background noise introduced in speech signal during capture, storage, transmission and processing using Spectral Subtraction algorithm. To consider the fact that colored noise corrupts the speech signal non-uniformly over different frequency bands, Multi-Band Spectral Subtraction (MBSS) approach is exploited wherein amount of noise subtracted from noisy speech signal is decided by a weighting factor. Choice of optimal values of weights decides the performance of the speech enhancement system. In this paper weights are decided based on SFM (Spectral Flatness Measure) than conventional SNR (Signal to Noise Ratio) based rule. Since SFM is able to provide true distinction between speech signal and noise signal. Spectrogram, Mean Opinion Score show that speech enhanced from proposed SFM based MBSS possess better perceptual quality and improved intelligibility than existing SNR based MBSS
Analysis of PEAQ Model using Wavelet Decomposition Techniquesidescitation
Digital broadcasting, internet audio and music database make use of audio
compression and coding techniques to reduce high quality audio signal without impairing its
perceptual quality. Audio signal compression is the lossy compression
technique, It
converts original converting audio signal into compressed bitstream. The compressed audio
bitstream is decoded at the decoder to produce a close approximation of the original signal.
For the purpose of improving the coding this work attempts to verify the perceptual
evaluation of audio quality (PEAQ) model in BS.1387 using wavelet decomposition
techniques. Finally the comparison of masking threshold for sub-bands using Wavelet
techniques and Fast Fourier transform (FFT) will be done
This is my presentation on a Journal Club. It's based on the article: "Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners". You can find all the references in the slide at the end of the article. I review very basic techniques in noise reduction, and how the techniques are implemented in the area of deep neural-network.
International Journal of Engineering Research and Applications (IJERA) aims to cover the latest outstanding developments in the field of all Engineering Technologies & science.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
Usually, hearing impaired people use hearing aids which are implemented with speech enhancement
algorithms. Estimation of speech and estimation of nose are the components in single channel speech
enhancement system. The main objective of any speech enhancement algorithm is estimation of noise power
spectrum for non stationary environment. VAD (Voice Activity Detector) is used to identify speech pauses
and during these pauses only estimation of noise. MMSE (Minimum Mean Square Error) speech
enhancement algorithm did not enhance the intelligibility, quality and listener fatigues are the perceptual
aspects of speech. Novel evaluation approach SR (Signal to Residual spectrum ratio) based on uncertainty
parameter introduced for the benefits of hearing impaired people in non stationary environments to control
distortions. By estimation and updating of noise based on division of original pure signal into three parts
such as pure speech, quasi speech and non speech frames based on multiple threshold conditions. Different
values of SR and LLR demonstrate the amount of attenuation and amplification distortions. The proposed
method will compared with any one method WAT(Weighted Average Technique) Hence by using
parameters SR (signal to residual spectrum ratio) and LLR (log like hood ratio), MMSE (Minim Mean
Square Error) in terms of segmented SNR and LLR.
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...ijsrd.com
Speech Enhancement by suppressing uncorrelated acoustically added noise has been a challenging topic of research for many years. These are the primary choice for real time applications due to the simplicity and comparatively low computational load. This paper shows VAD (Voice activity detection) technique that can detect the non speech segment from the speech signal. It is also shown that it can work powerfully in an unpredictable noise ambience. The technique is mostly done in microprocessors or DSP processors because of their flexibility. But there are several advantages of FPGA over DSP processors like high cost per logic element related to these processors makes them improper for large scale use. From the experimental results, VAD method is implemented on the FPGA chip.
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionIOSRJVSP
This paper is aimed to reduce background noise introduced in speech signal during capture, storage, transmission and processing using Spectral Subtraction algorithm. To consider the fact that colored noise corrupts the speech signal non-uniformly over different frequency bands, Multi-Band Spectral Subtraction (MBSS) approach is exploited wherein amount of noise subtracted from noisy speech signal is decided by a weighting factor. Choice of optimal values of weights decides the performance of the speech enhancement system. In this paper weights are decided based on SFM (Spectral Flatness Measure) than conventional SNR (Signal to Noise Ratio) based rule. Since SFM is able to provide true distinction between speech signal and noise signal. Spectrogram, Mean Opinion Score show that speech enhanced from proposed SFM based MBSS possess better perceptual quality and improved intelligibility than existing SNR based MBSS
Analysis of PEAQ Model using Wavelet Decomposition Techniquesidescitation
Digital broadcasting, internet audio and music database make use of audio
compression and coding techniques to reduce high quality audio signal without impairing its
perceptual quality. Audio signal compression is the lossy compression
technique, It
converts original converting audio signal into compressed bitstream. The compressed audio
bitstream is decoded at the decoder to produce a close approximation of the original signal.
For the purpose of improving the coding this work attempts to verify the perceptual
evaluation of audio quality (PEAQ) model in BS.1387 using wavelet decomposition
techniques. Finally the comparison of masking threshold for sub-bands using Wavelet
techniques and Fast Fourier transform (FFT) will be done
This is my presentation on a Journal Club. It's based on the article: "Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners". You can find all the references in the slide at the end of the article. I review very basic techniques in noise reduction, and how the techniques are implemented in the area of deep neural-network.
International Journal of Engineering Research and Applications (IJERA) aims to cover the latest outstanding developments in the field of all Engineering Technologies & science.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
발표자: 김준태 (KAIST 박사과정)
발표일: 2018.10
Voice activity detection (VAD) and speech enhancement (SE) are important front-end technologies for noise robust speech recognition system.
From incoming noisy signal, VAD detects the speech signal only and SE removes the noise signal while conserving the speech signal.
For VAD and SE, this presentation will cover the traditional methods, deep learning based methods, and our papers as follows:
1. J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
2. J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Also, this presentation will briefly introduce some experimental results in real-world environment (far-field, noisy environment), conducted on the embedded board.
For VAD,
Traditional VAD methods.
Deep learning based VAD methods.
Paper presentation: J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
End point detection based on VAD.
Experimental results of DNN-EPD on embedded board in real-world environment.
For SE,
Traditional SE methods.
Deep learning based SE methods.
Paper presentation: J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Experimental results in real-world environment.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
발표자: 김준태 (KAIST 박사과정)
발표일: 2018.10
Voice activity detection (VAD) and speech enhancement (SE) are important front-end technologies for noise robust speech recognition system.
From incoming noisy signal, VAD detects the speech signal only and SE removes the noise signal while conserving the speech signal.
For VAD and SE, this presentation will cover the traditional methods, deep learning based methods, and our papers as follows:
1. J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
2. J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Also, this presentation will briefly introduce some experimental results in real-world environment (far-field, noisy environment), conducted on the embedded board.
For VAD,
Traditional VAD methods.
Deep learning based VAD methods.
Paper presentation: J. Kim and M. Hahn, "Voice Activity Detection Using an Adaptive Context Attention Model," in IEEE Signal Processing Letters, vol. 25, no. 8, pp. 1181-1185, Aug. 2018.
End point detection based on VAD.
Experimental results of DNN-EPD on embedded board in real-world environment.
For SE,
Traditional SE methods.
Deep learning based SE methods.
Paper presentation: J. Kim and M. Hahn, "Speech Enhancement Using a Two Step Network," submitted to IEEE Signal Processing Letters, 2018.
Experimental results in real-world environment.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
Previous research has found autocorrelation domain as an appropriate domain for signal and noise
separation. This paper discusses a simple and effective method for decreasing the effect of noise on the
autocorrelation of the clean signal. This could later be used in extracting mel cepstral parameters for
speech recognition. Two different methods are proposed to deal with the effect of error introduced by
considering speech and noise completely uncorrelated. The basic approach deals with reducing the effect
of noise via estimation and subtraction of its effect from the noisy speech signal autocorrelation. In order
to improve this method, we consider inserting a speech/noise cross correlation term into the equations used
for the estimation of clean speech autocorrelation, using an estimate of it, found through Kernel method.
Alternatively, we used an estimate of the cross correlation term using an averaging approach. A further
improvement was obtained through introduction of an overestimation parameter in the basic method. We
tested our proposed methods on the Aurora 2 task. The Basic method has shown considerable improvement
over the standard features and some other robust autocorrelation-based features. The proposed techniques
have further increased the robustness of the basic autocorrelation-based method.
Speech Enhancement for Nonstationary Noise Environmentssipij
In this paper, we present a simultaneous detection and estimation approach for speech enhancement in nonstationary noise environments. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. Under speech-presence, the cost is proportional to a quadratic spectral amplitude error, while under speech-absence, the distortion depends on a certain attenuation factor. Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach which facilitate suppression of nonstationary noise with a controlled level of speech distortion.
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper we propose a new method for minimum tracking in Minimum Statistics (MS) noise estimation method. This noise estimation algorithm is proposed for highly nonstationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise estimation algorithm when integrated in speech enhancement was preferred over other noise estimation algorithms.
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
In this paper a new thresholding based speech enhancement approach is presented, where the threshold is statistically determined by employing the Teager energy operation on the Wavelet Packet (WP) coefficients of noisy speech. The threshold thus obtained is applied on the WP coefficients of the noisy speech by using a hard thresholding function in order to obtain an enhanced speech. Detailed simulations are carried out in the presence of white, car, pink, and babble noises to evaluate the performance of the proposed method. Standard objective measures, spectrogram representations and subjective listening tests show that the proposed method outperforms the existing state-of-the-art thresholding based speech enhancement approaches for noisy speech from high to low levels of SNR.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
P ERFORMANCE A NALYSIS O F A DAPTIVE N OISE C ANCELLER E MPLOYING N LMS A LG...ijwmn
n voice communication systems, noise cancellation
using adaptive digital filter is a renowned techniq
ue
for extracting desired speech signal through elimin
ating noise from the speech signal corrupted by noi
se.
In this paper, the performance of adaptive noise ca
nceller of Finite Impulse Response (FIR) type has b
een
analysed employing NLMS (Normalized Least Mean Squa
re) algorithm.
An extensive study has been made
to investigate the effects of different parameters,
such as number of filter coefficients, number of s
amples,
step size, and input noise level, on the performanc
e of the adaptive noise cancelling system. All the
results
have been obtained using computer simulations built
on MATLAB platform.
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANIJNSA Journal
The signal to noise ratio (SNR) is one of the important measures for reducing the noise.A technique that uses a linear prediction error filter (LPEF) and an adaptive digital filter (ADF) to achieve noise reduction in a speech and image degraded by additive background noise is proposed. Since a speech signal can be represented as the stationary signal over a short interval of time, most of speech signal can be predicted by the LPEF. This estimation is performed by the ADF which is used as system identification. Noise reduction is achieved by subtracting the reconstructed noise from the speech degraded by additive background noise. Most of the MR image accelerating methods suffers from degradation of acquired images, which is often correlated with the degree of acceleration. However, Wideband MRI is a novel technique that transcends such flaws.In this paper we proposed LPEF and ADF for reducing the noise in speech and also we demonstrate that Wideband MRI is capable of obtaining images with identical quality as conventional MR images in terms of SNR in wireless LAN.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
The performance of various acoustic feature extraction methods has been compared in this work using
Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic
features are a series of vectors that represents the speech signals. They can be classified in either words or
sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic
vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector
extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction
(PLP) have also been used. These two methods closely resemble the human auditory system. These feature
vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes
are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to
investigate the nature of those acoustic features.
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
129966864160453838[1]
1. 1
Smoothing Hidden Markov Models by Using an
Adaptive Signal Limiter for Noisy Speech Recognition
Wei-Wen Hung
Department of Electrical Engineering
Ming Chi Institute of Technology
Taishan, 243, Taiwan, Republic of China
E-mail : wwhung@ccsun.mit.edu.tw
FAX : 886-02-2903-6852; Tel. : 886-02-2906-0379
and
Hsiao-Chuan Wang
Department of Electrical Engineering
National Tsing Hua University
Hsinchu, 30043, Taiwan, Republic of China
E-mail : hcwang@ee.nthu.edu.tw
FAX : 886-03-571-5971; Tel. : 886-03-574-2587
Paper No. : 1033. (second review)
Corresponding Author : Hsiao-Chuan Wang
Key Words : hidden Markov model (HMM), hard limiter, adaptive signal limiter
(ASL), autocorrelation function, arcsin transformation.
2. 2
Smoothing hidden Markov models by using
an adaptive signal limiter for noisy speech recognition
Wei-Wen Hung and Hsiao-Chuan Wang
Department of Electrical Engineering, National Tsing Hua University
Hsinchu, 30043, Taiwan, Republic of China
Abstract. When a speech recognition system is deployed in the real world, environmental interference will
make noisy speech signals and reference models mismatched and cause serious degradation in recognition
accuracy. To deal with the effect of environmental mismatch, a family of signal limiters has been successfully
applied to a template-based DTW recognizer to reduce the variability of speech features in noisy conditions.
Though simulation results indicate that heavily smoothing can effectively reduce the variability of speech
features in low signal-to-noise ratio (SNR), it would also cause the loss of information in speech features.
Therefore, we suggest that the smoothing factor of a signal limiter should be related to SNR and adapted on a
frame by frame basis. In this paper, an adaptive signal limiter (ASL) is proposed to smooth the instantaneous
and dynamic spectral features of reference models and test speech. By smoothing spectral features, the
smoothed covariance matrices of reference models can be obtained by means of maximum likelihood (ML)
estimation. A speech recognition task for multispeaker isolated Mandarin digits has been conducted to
evaluate the effectiveness and robustness of the proposed method. Experimental results indicate that the
adaptive signal limiter can achieve significant improvement in noisy conditions and is more robust than the
hard limiter over a wider range of SNR values.
Key words. hidden Markov model (HMM), hard limiter, adaptive signal limiter (ASL), autocorrelation
function, arcsin transformation.
This research has been partially sponsored by the National Science Council, Taiwan, ROC, under contract
number NSC-88-2614-E-007-002.
3. 3
LIST OF FIGURES AND TABLES
Fig. 1 Block diagram for implementing a speech recognizer with adaptive signal limiter.
Fig. 2 The various LPC log magnitude spectra of utterance ‘1’ in clean condition.
(a) LPC log magnitude spectra without signal limiter.
(b) LPC log magnitude spectra with hard limiter.
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 3 The various LPC log magnitude spectra of utterance ‘1’ distorted by 20 dB white noise.
(a) LPC log magnitude spectra without signal limiter.
(b) LPC log magnitude spectra with hard limiter.
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 4 The various LPC log magnitude spectra of utterance ‘1’ distorted by 20 dB factory noise.
(a) LPC log magnitude spectra without signal limiter.
(b) LPC log magnitude spectra with hard limiter.
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 10 , SNR dBUB = 40 .)
Fig. 5 The average log likelihoods of utterance ‘1’ evaluated on various word models in white noise.
(a) Comparison of average log likelihoods without signal limiter.
(b) Comparison of average log likelihoods with hard limiter.
(c) Comparison of average log likelihoods with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 6 The average log likelihoods of utterance ‘1’ evaluated on various word models in factory noise.
(a) Comparison of average log likelihoods without signal limiter.
(b) Comparison of average log likelihoods with hard limiter.
(c) Comparison of average log likelihoods with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 10 , SNR dBUB = 40 .)
Table 1. Comparison of digit recognition rates (%) for white noise.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Table 2. Comparison of digit recognition rates (%) for factory noise.
5. 5
1. Introduction
When a speech recognition system trained in a well-defined environment is used in the real world
applications, the acoustic mismatch between training and testing environments will degrade its recognition
accuracy severely. This acoustic mismatch is mainly caused by a wide variety of distortion sources, such
as ambient additive noises, channel effect and speaker’s Lombard effect. During the past several decades,
researchers focused their attentions in dealing with the mismatch problem and tried to narrow the
mismatch gap. There are many algorithms have been proposed and successfully applied for robust
speech recognition. Generally speaking, the methods for handling noisy speech recognition could be
roughly classified into the following approaches (Sankar and Lee, 1996). The first approach tries to
minimize the distance measures between reference models and testing signals by adaptively adjusting
speech signals in feature space. For example, Mansour and Juang (Mansour and Juang, 1989) found that
the norm of a cepstral vector is shrunk under noise contamination. Therefore, they used a first-order
equalization method to adapt the cepstral means of reference models so that the shrinkage of speech
features can be adequately compensated. Likewise, Carlson and Clement (Carlson and Clement, 1994)
also proposed a weighted projection measure (WPM) for recognition of noisy speech in the framework
of continuous density hidden Markov model (CDHMM). In addition, the norm shrinkage of cepstral
means will also lead to the reduction of HMM covariance matrices. Thus, Chien et al., (Chien, 1997a;
Chien et al., 1997b) proposed a variance adapted and mean compensated likelihood measure
(VA-MCLM) to adapt the mean vector and covariance matrix simultaneously.
The second approach estimates a transformation function in model space for transforming reference
models into testing environment and thus the environmental mismatch gap can be effectively reduced. In
the literature, there were a number of techniques compensating ambient noise effect in model space.
Among them, one of the most promising techniques is the so-called parallel model combination (PMC).
In the PMC algorithm, Varga and Moore (Varga and Moore, 1992a) adapted the statistics of reference
6. 6
models to meet the testing conditions by optimally combining the reference models and noise model in
linear spectral domain. In the later few years, several related works have been successively reported for
improving the performance of PMC method. Flores and Young (Flores and Young, 1992) integrated
spectral subtraction (SS) and PMC methods to seek for further improvement in recognition accuracy. In
addition, Gales and Young (Gales and Young, 1995) extended PMC scheme to include the effect of
convolutional noise.
In the third approach, a more robust feature representation is developed in signal space so that the
speech feature is invariant or less susceptible to environmental variations. In this approach, Lee and Lin
(Lee and Lin, 1993) developed a family of signal limiters as a preprocessor to smooth speech signals.
When a speech signal is passed through a signal limiter with zero smoothing factor (i.e., a hard limiter),
the hard limiting operation preserves the sign of an input speech signal and ignores its magnitude. Thus,
the hard-limited speech signal is only affected by ambient noises when the signal-to-noise ratio (SNR) is
relatively low. This smoothing process for feature vectors has been shown to be effective for reducing the
variability of feature vectors in a noisy environment and make them less affected by ambient noises over a
wide range of SNR values. Experimental results for recognition of 39-word alpha-digit vocabulary also
demonstrate that an equivalent gain of 5-7 dB in SNR can be achieved for a template-based DTW
recognizer.
However, from the experimental results reported by Lee and Lin (Lee and Lin, 1993), we can also
observe that the recognition accuracy using a hard limiter for clean speech becomes worse. This
phenomenon may be explained as follows. For an utterance, the amplitudes of unvoiced segments are
generally much lower than the amplitudes of voiced segments. Heavily smoothing can reduce feature
variability of the speech segments with low SNR, but it also causes the loss of some important
informations embedded in the clean segments and the segments with high SNR. Therefore, a signal limiter
with fixed smoothing factor might not work well for the all segments in a speech utterance. We suggest
7. 7
that the smoothing factor of a signal limiter should be related to SNR value and adapted on a frame by
frame basis. In this paper, an adaptive signal limiter (ASL) is proposed to smooth the instantaneous and
dynamic spectral features of hidden Markov models (HMM) and testing speech signals. In addition, in
order to moderately reflect the variation of model covariance due to application of signal limiting
operation to the state statistics of word models, the adaptation of covariance matrix is also performed in
the sense of maximum likelihood (ML) estimation.
The layout of this paper is as follows. In the subsequent section, we describe the detailed formulation
of the proposed adaptive signal limiter and its extension to the framework of a continuous density hidden
Markov model. In Section 3, we investigate the behavior of LPC spectra of a speech utterance and its
signal-limited version under the influence of various ambient noises. In addition, a series of experiments
were conducted to compare the discriminability of different signal limiters in various noisy conditions.
Some experiments for recognition of multispeaker isolated Mandarin digits were performed in Section 4
to evaluate the effectiveness and robustness of the proposed method in presence of ambient noises.
Finally, a conclusion is drawn in Section 5.
2. Smoothing hidden Markov models by using an adaptive signal limiter
In this section, we describe the detailed formulation of the proposed adaptive signal limiter (ASL) and
its extension to the framework of an HMM-based speech recognizer.
2.1 Representation of the underlying hidden Markov models
Conventionally, for a continuous density hidden Markov model (CDHMM), the output likelihood
measure of t th− frame in the testing utterance { }Y y ct dt t Tyt
= = ≤ ≤[ , ],1 based on the
statistics of i th− state of word model { }Λ Λ Σ( ) ( , ),
, , ,
w i S
w i w i w i w= = ≤ ≤µ 1 can be
8. 8
characterized by a multivariate Gaussian probability density function (pdf) and formulated as
p y
p
y yt w i w i t w i
T
w i t w i
( ) ( ) exp ( ) ( )
, , , , ,
Λ Σ Σ= ⋅
−
⋅ ⋅ − ⋅ − ⋅ ⋅ −
−
−
2
1
2
1 2
1
π µ µ , (1)
where µw i w i w ic d, , ,[ , ]= denotes the mean vector of i th− state of word model Λ( )w and
consists of p − order cepstral vector cw i, and p − order delta cepstral vector dw i, . Σw i, denotes
the covariance matrix of i th− state of word model Λ( )w and is simplified as a diagonal matrix, i.e.,
Σw i w i w i w idiag p, , , ,[ ( ) ( ) ( )]= ⋅⋅⋅ ⋅σ σ σ2 2 2
1 2 2 . However, in order to adequately reflect the
variation of dynamic spectral features due to application of a signal limiting operation to instantaneous
spectral features, the representation of state statistics in a conventional hidden Markov model is modified
slightly. In our approach, the mean vector µw i w i w ic d, , ,[ , ]= of i th− state of the word model
Λ( )w is indirectly represented by the normalized autocorrelation vectors of a five-frame context
window (Lee and Wang, 1995), that is [ , , , , ], , , , , , , , , ,r r r r rw i w i w i w i w i− −2 1 0 1 2 , where
r r r pw i j w i j w i j
T
, , , , , ,[ ( ), , ( )]= ⋅⋅⋅1 , j =0 denotes the instantaneous frame, j =-1, -2 the left context
frames and j =1, 2 the right context frames. The estimation of those normalized autocorrelation vectors
in a five-frame context window is proceeded as follows. Firstly, a conventional hidden Markov model is
trained for each word by means of the segmental k-means algorithm. Then, based upon the obtained
word models, each frame in the training utterances is labeled with its decoded state identity by using the
Viterbi decoding algorithm. Those instantaneous, left-context and right-context autocorrelation vectors
corresponding to the same state identity are collected and averaged to obtain the indirect representation
of the underlying hidden Markov models. For example, the normalized autocorrelation vectors of i th−
state of the word model Λ( )w can be formulated by
[ , , , , ]
[ , , , , ]
,, , , , , , , ,1 , ,
, , , , ,
,
r r r r r
r r r r r
N
w i w i w i w i w i
w t
u
w t
u
w t
u
w t
u
w t
u
u t
s
− −
− − + +
=
∑
2 1 0 2
2 1 1 2
(2)
9. 9
where rw t
u
, represents the normalized autocorrelation vector of the u th− training utterance, t th−
frame of word w . Above summation includes all the N s frames which are labeled with the state
identity i of word model Λ( )w .
Based upon this indirect representation, the analysis equations of linear predictive coding (LPC) model
can be expressed in matrix form as
R a rw i j w i j w i j, , , , , ,⋅ = for j = − ⋅⋅⋅2 2, , , (3)
where Rw i j, , is an autocorrelation matrix of the form
R
r r r p
r r r p
r p r p r
w i j
w i j w i j w i j
w i j w i j w i j
w i j w i j w i j
, ,
, , , , , ,
, , , , , ,
, , , , , ,
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
=
−
−
− −
0 1 1
1 0 2
1 2 0
L
L
M M O M
L
. (4)
Since the autocorrelation matrix is Toeplitz symmetric and positive definite, the LPC coefficient vector
a a a a pw i j w i j w i j w i j
T
, , , , , , , ,[ ( ) ( ) ( ) ]= 1 2 L can be solved efficiently by the Levinson-Durbin
recursion method (Rabiner and Juang, 1993). Once we obtain the LPC coefficient vector for Eq. (3), the
corresponding cepstral vector cw i j, , can be recursively calculated by using the LPC to cepstral
coefficient conversion formula
c m a m
k
m
c k a m kw i j w i j w i j w i j
k
m
, , , , , , , ,( ) ( ) ( ) ( ) ( )= + ⋅ ⋅ −
=
−
∑
1
1
, 1 ≤ ≤m p . (5)
Finally, the cepstral vector of instantaneous frame, i.e., cw i j, , for j =0, is used as the mean vector,
cw i, , of i th− state of word model Λ( )w . In addition, the corresponding delta cepstral vector
dw i, can also be calculated by using the following equation :
d
j c
j
w i
w i j
j
j
j
j,
, ,
=
⋅
=−
=
=−
=
∑
∑
2
2
2
2
2
. (6)
10. 10
2.2 Formulation of the adaptive signal limiter
For recognition of noisy speech, it has been observed that employing a signal limiter to smooth a
speech signal in time domain leads to significant performance improvement. The basic theory of a signal
limiter can be roughly described as follows (Lee and Lin, 1993). When a signal x is passed through a
signal limiter, the signal limiting operation is equivalent to performing a nonlinear transformation on the
input signal and so that the corresponding output signal y can be essentially characterized by an error
function of the form :
y s x
K
t dt
x
= =
⋅ ⋅
⋅ − ⋅∫( ) exp[ ( )] ,
2
2
2
2 2
0
π σ
σ (7)
where K is a scaling constant and σ 2
is a tunable factor for adjusting the smoothing degree of a
signal limiting operation.
In light of above pronounced smoothing property, a signal limiter can be readily extended to the
processing of speech signals in a noisy environment. Consider an input speech signal x , approximated
by a zero mean, stationary Gaussian process with variance σx
2
, has the density function as
g x
x
x x
( ) exp .=
⋅ ⋅
⋅ −
⋅
1
2 22
2
2
π σ σ
(8)
Then, the output y of a signal limiter has the density function expressed as (see Appendix A)
( ) ,
2
1exp))(()( 2
2
⋅⋅
⋅−−⋅==
x
x
K
xshyh
σδ
δ
δ
(9)
where x s y= −1
( ). δ denotes the smoothing factor of a signal limiter and is defined as δ σ σ= 2 2
x .
The larger the value of δ , the smaller the value of output signal y . When the smoothing factor δ
approaches 0, the corresponding signal limiter changes into a hard limiter of the form
y f x
K if x
if x
K if x
= =
>
=
− <
( )
2 0
0 0
2 0
. (10)
A signal limiting operationcan also be interpreted as an arcsin transformation in autocorrelation domain.
Assume that the autocorrelation functions of input speech signal x and its signal-limited output y are
denoted as rx ( )τ and ry ( )τ , respectively. Then, the normalized autocorrelation function of a
signal-limited output y can be formulated as (Lee and Lin, 1993)
11. 11
[ ]
[ ]
r
r
r
r
r
r
y
y
y
x
( )
( )
( )
sin ( ) ( )
sin ( )
τ
τ τ δ
δ
≡ =
+
+
−
−
0
1
1 1
1
1
, (11)
where
r
r r rx x x( ) ( ) ( )τ τ≡ 0 is the normalized autocorrelation function of the input speech signal x .
By properly adjusting the smoothing factor δ , various degrees of smoothing effect can be obtained.
When δ approaches infinity, the normalized autocorrelation function of an input speech signal
r
rx ( )τ
is almost equal to the normalized autocorrelation function of the corresponding signal-limited output
r
ry ( )τ . Furthermore, in the case of δ = 0, the normalized autocorrelation function of the signal-limited
output
r
ry ( )τ is reduced to the following equation (see Appendix B) :
[ ]r r
r ry x( ) sin ( ) .τ
π
τ= ⋅ −2 1
(12)
In the literature presented by Lee and Lin (Lee and Lin, 1993), they used a hard limiter as a
pre-processor to reduce the variability of feature vectors in noisy conditions. That is, a pre-determined
smoothing factor is used throughout a speech signal. However, it is known that the segments of clean
speech with less energy are influenced most by ambient noises and thus require heavily smoothing. As to
the clean segments and the segments with high SNR, excessively smoothing not only destroys their
distinct features but also reduces the discriminability of speech features in a noisy environment. Therefore,
we propose an adaptive signal limiter (ASL) in which the smoothing factor δ is related to SNR and
adapted on a frame by frame basis. In the proposed adaptive signal limiter, the smoothing factor δ is
empirically formulated as :
( )δ
δ
δ δ
δ
δ
( ) ,
min
max min
min
max
SNR
if SNR SNR
SNR SNR
SNR SNR if SNR SNR SNR
if SNR SNR
LB
UB LB
LB LB UB
UB
=
<
−
−
⋅ − + ≤ ≤
>
(13)
and SNR
E
E
s
n
≡ ⋅10 10log ( ), (14)
where δ δmin max, , ,SNR SNRLB UB are tuning constants, E s is the frame energy of a clean speech
signal and E n is the noise energy. In the subsequent experiments, the arcsin transformation shown in
Eq.(11)-(14) are used to compute the normalized autocorrelation of a signal-limited signal rather than
directly applying the nonlinear operation of Eq. (7) on the input signal. This is because of that the
underlying hidden Markov models are indirectly represented by the LPC-based spectral features. The
LPC spectral features can be efficiently calculated from autocorrelation function by means of Eq.(5).
12. 12
Moreover, comparing with the signal limiting operation shown in Eq.(7), the arcsin transformation
requires less computation cost.
2.3 Adaptations of dynamic spectral feature and covariance matrix
When a signal limiting operation is performed on the autocorrelation function of a speech signal, it not
only smooths instantaneous spectral vectors but also leads to reduction of the corresponding dynamic
spectral features and model covariance matrices. Therefore, in order to achieve higher consistency, the
adaptations of a model’s dynamic spectral features and its covariance matrices are necessary. This
adaptation procedure is proceeded as follows. When the t th− frame yt of a testing utterance Y is
evaluated on the state Λw i, , the cepstral vectors ct j, of its context frames yt j, , for j− ≤ ≤2 2 ,
are first transformed to give the corresponding normalized autocorrelation vectors rt j, . Then, those
normalized autocorrelation vectors r r r pt j t j t j
T
, , ,[ ( ), , ( )]= ⋅⋅⋅1 are processed by the following
arcsin transformation :
[ ]
[ ]
~ ( )
sin
( )
( )
sin
( )
,,
,
,
,
r
r
SNR
SNR
t j
t j
t j
t j
τ
τ
δ
δ
=
+
+
−
−
1
1
1
1
1
for − ≤ ≤2 2j and 1 ≤ ≤τ p . (15)
In above equation, the SNRt j, variable is determined by
SNR
E E
Et j
t j n
n
, log
( )
,= ⋅
−
−
10 10 (16)
where Et is the t th− frame energy in the testing utterance Y . E n is the noise energy and can be
roughly estimated by selecting the lowest energy in the testing utterance Y , i.e.,
{ }E E E En Ty
= ⋅⋅⋅min , , ,1 2 . Once the smoothed autocorrelation vectors ~ ,,r for jt j − ≤ ≤2 2 , were
obtained, the smoothed testing cepstral vector ~
,ct j of ~
,yt j can be calculated by means of the LPC to
cepstrum conversion formula. Moreover, the corresponding smoothed testing delta cepstral vector
~
dt
can also be solved by using the following equation :
~
~
,
d
j c
j
t
t j
j
j
j
j
=
⋅
=−
=
=−
=
∑
∑
2
2
2
2
2
, (17)
13. 13
and thus, the smoothed testing feature vector ~ [ ~ ,
~
]y c dt t t= can be taken as the term ~ [ ~ ,
~
], ,y c dt t t0 0= .
Similarly, in order to avoid introducing mismatch between testing speech signals and reference models,
the mean vector of state Λw i, should be also smoothed by using Eq. (11) with the same smoothing
factor, and thus its smoothed version
~
[ ~ ,
~
], , ,µw i w i w ic d= can be obtained. On the other hand, by
substituting
~
[ ~ ,
~
], , ,µw i w i w ic d= and ~ [ ~ ,
~
]y c dt t t= into Eq.(1), we may obtain
~ ( ~ ( ~ , )) ( ) exp ( ~ ~ ) ( ~ ~ )
, , , , , ,
p y
p
y yt w i w i w i t w i
T
w i t w i
µ π µ µΣ Σ Σ= ⋅
−
⋅ ⋅ − ⋅ − ⋅ ⋅ −
−
−
2
1
2
1 2
1
. (18)
By taking differential of logarithm ofEq. (18) with respect to Σw i, and setting the result to zero, we can
obtain the optimal smoothed covariance matrix
~
,Σw i which maximize the likelihood function in Eq.(18),
that is (see Appendix C)
~
~ ( ) ~ ( )
( )
~
( )
~
( )
( )
.,
,
,
,
,
,Σ Σw i
t w i
w i
t w i
w im
m p
m
m p
w i
c m c m
m
d m d m
p m
p
=
−
+
−
+
⋅
⋅
=
=
=
=
∑∑ σ σ
2
11
2
2
(19)
Finally, the resulting smoothed output likelihood measure can be rewritten as :
~ ( ~ ~
) ( )
~
exp ( ~ ~ )
~
( ~ ~ )
, , , , ,
p y
p
y yt w i w i t w i
T
w i t w i
Λ Σ Σ= ⋅
−
⋅ ⋅ − ⋅ − ⋅ ⋅ −
−
−
2
1
2
1 2
1
π µ µ . (20)
2.4 Implementation of a speech recognizer with adaptive signal limiter
To be more detailed, the overall system diagram for implementing a HMM-based speech recognizer
with adaptive signal limiter is depicted in Fig. 1. In the training phase, we first train a set of word models
by using the segmental k-means algorithm and Viterbi decoding method (Juang and Rabiner, 1990). Also,
the state statistics of a word model are indirectly represented by the normalized autocorrelation vectors of
a five-frame context window. When a testing utterance Y is to be recognized, we first use Eq.(15) and
Eq.(16) to estimate the frame-dependent smoothing factor and perform arcsin transformation on the
normalized autocorrelation vectors rt j, . When the arcsin-transformed vectors ~
,rt j are obtained, we
can solve the smoothed cepstral vector ~
,ct j and its delta cepstral vector by LPC to cepstrum
conversion formula and Eq. (17). Moreover, the same smoothing factor is also used to smooth the state
14. 14
statistics of word models. Once the smoothed autocorrelation vectors ~
, ,rw i j are obtained, the
smoothed cepstral vectors ~
, ,cw i j can likewise be calculated by means of the LPC to cepstrum
conversion formula. Moreover, the corresponding smoothed delta cepstral vector
~
,dw i and covariance
matrix
~
,Σw i can also be solved by using the Eq.(6) and Eq.(19). Finally, by substituting ~y t ,
~
,µw i and
~
,Σw i into Eq. (20), we can obtain the smoothed output likelihoods.
(Figure 1 is about here.)
3. Effectiveness and robustness of the adaptive signal limiter
3.1 Database and experimental conditions
A multispeaker (50 male and 50 female speakers) isolated Mandarin digit recognition (Lee and Wang,
1994) was conducted to demonstrate the effectiveness and robustness of the proposed adaptive signal
limiter. There are three sessions of data collection in the digit database. For each session, every speaker
uttered a set of 10 Mandarin digits. Speech signals are sampled at 8 KHz. Each frame contains 256
samples with 128 samples overlapped, and is multiplied by a 256-point Hamming window. Endpoints are
not detected so that each utterance still contains about 0.1~0.5 seconds of pre-silence and post-silence.
Each digit is modeled as a left-to-right HMM without jumps in which the output of each state is a
2-mixture Gaussian distribution of feature vectors. Each word model contains seven to nine states
including pre-silence and post-silence states. The feature vector is indirectly represented by the 12-order
normalized autocorrelation vectors of a five-frame context window. This representation can be then
transformed into a 12-order cepstral vectors and a 12-order delta cepstral vector. Moreover,
NOISEX-92 noise database (Varga et al., 1992b) was used for generating noisy speech. The
subsequent experiments were conducted to examine the following problems : (1) influence of signal
limiters on the LPC spectra of clean speech, (2) influence of signal limiters on the LPC spectra of noisy
speech, and (3) effects of signal limiters on speech discriminability in a noisy environment.
15. 15
3.2 Influence of signal limiters on LPC spectra of clean speech
A sample utterance of Mandarin digit ‘1’ uttered by a male speaker is used to demonstrate the
influence of signal limiters on LPC spectra of clean speech. The 12-order LPC spectrum analysis is
performed on a 32 msec window with 16 msec frame shift. To observe the spectral variation in frequency
domain, we ploted the LPC spectra of 15 consecutive frames extracted from the middle portion of the
sample utterance. Figure 2 shows the log LPC spectra of the sample utterance ‘1’ in the cases of without
signal limiter, with hard limiter and with adaptive signal limiter. From this figure, we can observe that the
formants of utterance ‘1’ occur about at the positions of 200Hz, 1950Hz, 3100 Hz and 3350Hz. After
applying a signal limiter, it is noted that parts of the original spectra become more smoothed and their
formant peaks are broaden. Especially, in the case of using a hard limiter, the second, third and fourth
formants are severely suppressed. Since the location and spacing of formant frequencies are highly
correlated with the shape of a vocal tract, this suppression will reduce the discriminability of speech
utterances and lead to misrecognition. On the other hand, we can also find that the spectral shape in the
case of using the adaptive signal limiter is almost unaffected. This is mainly due to that an adaptive signal
limiter employing larger smoothing factor is useful to keep the arcsin-transformed autocorrelation function
almost unchanged in clean condition.
(Figure 2 is about here.)
3.3 Influence of signal limiters on LPC spectra of noisy speech
In this subsection, we explore the influence of signal limiters on LPC spectra of noisy speech. This is
shown in Fig. 3 and Fig. 4, where we plot the LPC spectra of the same utterance shown in Fig. 2 with 20
dB additive white Gaussian noise and factory noise, respectively. When a white noise is added to clean
speech, there gradually appears an abnormal formant peak in the LPC spectra of distorted utterance ‘1’
at about 1125 Hz ~ 1625 Hz as shown in Fig. 3 (a). This phenomenon also happens in the case of
16. 16
adding a factory noise to clean speech. In the case of adding factory noise, the abnormal formant peak
occurs at about 1000 Hz ~ 1375 Hz. However, comparing with the baseline case, the spectral distortion
in the LPC spectra with signal limiter are less pronounced. This property verifies the robustness of signal
limiters in a noisy environment. In addition, a comparison of Fig. 3, Fig. 4 and Fig. 2 shows that
excessively smoothing autocorrelation function will suppress parts of formant peaks and lose some
important informations about the shape of a vocal tract. Instead of using a fixed smoothing factor, an
adaptive signal limiter adaptively adjusting the degree of smoothness can not only effectively reduce the
variability of speech features, but also preserve more useful spectral information embedded in a speech
signal.
(Figure 3 and figure 4 are about here.)
3.4 Effects of signal limiters on speech discriminability in a noisy environment
In this subsection, we evaluate the robustness of signal limiters in noisy conditions. First, the first two
sessions of database were used to train a set of word models by using the segmental k-means algorithm.
To generate noisy speech, white Gaussian noise and factory noise were separately added to the 100
utterances of Mandarin digit ‘1’ in the third session. Those distorted utterances were then evaluated on
the 10 word models to obtain maximum log likelihoods. For each word model, we can find the average
log likelihoods by averaging the accumulation of all log likelihoods corresponding to the same word
model. In Fig. 5 and Fig. 6, we plot the average log likelihoods of utterance ‘1’ as a function of SNR
values in the cases of white Gaussian noise and factory noise, respectively. When the underlying
environment is getting noisy, i.e., below a SNR threshold, utterance ‘1’ is easily misrecognized as
utterance ‘7’. For white noise, the SNR thresholds occur at about 20 dB, 15 dB and 7 dB for the cases
of without signal limiter, with hard limiter and with adaptive signal limiter, respectively. Similarly, for
factory noise, the SNR thresholds occur at about 15 dB, 10 dB and 3 dB for the cases of without signal
17. 17
limiter, with hard limiter and with adaptive signal limiter, respectively. These experimental results reveal
that an equivalent gain of about 12 ~ 13 dB and 7 ~ 8 dB in SNR can be achieved when the adaptive
signal limiter is compared with the baseline and hard limiter for recognition of utterance ‘1’ in noisy
conditions, respectively.
(Figure 5 and figure 6 are about here.)
4. Experimental results and discussion
In this section, a multispeaker (50 males and 50 females) recognition of isolated Mandarin digits (Lee
and Wang, 1994) was conducted to demonstrate the merits of the proposed method. The experimental
setup and underlying database have been described in subsection 3.1. In the experiments we conducted,
a conventional hidden Markov model without incorporating any signal limiters is referred as a baseline
system. The ambient noises including white Gaussian noise, F16 noise and factory noise were separately
added to clean speech with predetermined SNRs at 20, 15, 10, 5 and 0 dB to generate various noisy
speech signals. Moreover, the parameters used in the proposed adaptive signal limiter under different
noisy conditions are determined empirically as follows. Firstly, the smoothing factor δ is initially set to
0 and increased with increment ∆δ = 0 1. while SNRLB and SNRUB are kept constant. It is
observed that when the smoothing factor δ is beyond 1 , smoothing operation has little effect on digit
recognition rates. This phenomenon also happens in the cases of using different sets of parameters
SNRLB and SNRUB . Therefore, the maximum value of smoothing factor can be well approximated by
setting δmax .= 1 0 and employed throughout all experiments. Similarly, we chose a SNR lower bound
from the interval 0 30~ dB while a SNR upper bound from the interval 20 50~ dB with increment
5dB to test which set of SNR parameters can achieve better digit recognition accuracy.
In Table 1, we assess the recognition accuracy of baseline, parallel model combination (PMC),
18. 18
baseline with hard limiter and baseline with adaptive signal limiter for recognition of noisy speech under
the influence of a white noise. From the experimental results, we can find that the baseline with hard
limiter improves the recognition accuracy at low SNR and performs worse at high SNR and clean
condition. This is mainly because that oversmoothing autocorrelation function severely distorts some
important spectral informations embedded in original speech signals. On the other hand, the improvement
of the proposed adaptive signal limiter is remarkable due to the helps of adaptively adjusting the
smoothing factor. The adaptive signal limiter further outperforms the hard limiter. This means that using
larger smoothing factors for clean condition and high SNR is as important as using smaller smoothing
factors for low SNR.
(Table 1 is about here.)
Moreover, we also find that the PMC method is superior to the proposed adaptive signal limiter
technique in recognition accuracy. This superiority is mainly due to that the PMC method decomposes
the concurrent processes of speech and background noise, and so that the environmental mismatch can
be effectively reduced by optimally combining those two processes in the linear spectral domain. In
contrast, the environmental mismatch is not compensated during the signal limiting operation. The
proposed adaptive signal limiter can be considered as a weighting function which neglects the speech
segments with low SNR by heavily smoothing their features in autocorrelation domain. This smoothing
operation not only reduces feature variability in noisy conditions but also inevitably deteriorates parts of
characteristics of speech features. Therefore, it is intuitive that the PMC method has better recognition
accuracy as comparing with the proposed method. However, those comparison results do not indicate
that the proposed method is useless for noisy speech recognition. For the segments with low SNR (e.g.,
distorted unvoiced segment), the adaptive signal limiter seems to be more effective than the PMC method
in some noisy conditions. This implies that model adaptation is useful for high and medium SNRs while
feature smoothing is more feasible for low SNR. As described in the paper proposed by C. H. Lee and
19. 19
C. H. Lin (Lee and Lin, 1993), a signal limiter can be combined with other noise-robust speech
recognition techniques to obtain an additional performance improvement. Therefore, it is expected that by
properly integrating the adaptive signal limiter with other noise-robust speech recognition techniques, such
as WPM, PMC methods, further improvement in recognition accuracy could be obtained.
Likewise, the comparison of different methods in the presence of factory noise and F16 noise are also
illustrated in Table 2 and Table 3, respectively. We can observe that the proposed method consistently
achieves remarkable improvement in recognition accuracy. This result verifies the effectiveness and
robustness of the adaptive signal limiter for speech recognition in white noise as well as colored noises.
As far as the computation time is concerned, the adaptive signal limiter needs fewer computation time
than the PMC method. The reduction in CPU time is about 25%. A detail of CPU time for different
methods is shown in the Table 4.
(Table 2, Table 3, and Table 4 are about here.)
5. Conclusion
In this paper, we explore the influence of a hard limiter on LPC spectra of clean and noisy speech. It is
found that excessively smoothing in autocorrelation domain of a speech signal will suppress parts of
formant peaks and reduce the discriminability of speech features in noisy conditions. Based upon the
weakness of a hard limiter, an adaptive signal limiter is proposed to improve its robustness. In our
approach, the smoothing degree of a signal limiter is related to SNR value and adaptively determined on
a frame by frame basis. That is, the smaller the SNR value of a speech frame, the smaller the smoothing
factor of a signal limiter. Experimental results verify that an adaptive signal limiter outperforms a hard
limiter at various SNRs. This improvement is mainly due to that an adaptive signal limiter not only reduces
feature’s variability in low SNR, but also preserves some important informations bearing in the speech
segments with high SNR.
20. 20
Acknowledgement
The authors would like to thank Dr. Lee-Min Lee of Mingchi Institute of Technology, Taipei, Taiwan,
for his enthusiasm in providing experiences for implementing the new representation of hidden Markov
model with five-frame context window.
References
Carlson, B. A., Clement, M. A., 1994. A projection-based likelihood measure for speech recognitionin
noise. IEEE Trans. on Speech and Audio Processing. Vol. 2, pp. 97-102.
Chien, J. T., 1997a. Speech recognition under telephone environments. Ph.D. Thesis. Department of
Electrical Engineering, National Tsing Hua University, Taiwan, R.O.C.
Chien, J. T., Lee, L. M., Wang, H. C., 1997b. Extended studies on projection-based likelihood measure
for noisy speech recognition. revised in IEEE Trans. on Speech and Audio Processing.
Flores, J. A. N., Young, S. J., 1992. Continuous speech recognition in noise using spectral subtraction
and HMM adaptation. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). San
Francisco. Vol. 1, pp. 409-412.
Gales, M. J. F., Young, S. J., 1995. Robust speech recognition in additive and convolutional noise using
parallel model combination. Computer Speech and Language, Vol. 4, pp. 352-359.
Juang, B. H., Rabiner, L. R., 1990. The segmental k-means algorithm for estimating parameters of
hidden Markov models. IEEE Trans. on Acoustics, Speech, Signal Proc., 38(9) : 1639-1641,
September.
Lee, C. H., Lin, C. H., 1993. On the use of a family of signal limiters for recognition of noisy speech.
Speech Communication, Vol. 12, pp. 383-392.
Lee, L. M., and Wang, H. C., 1994. A study on adaptation of cepstral and delta cepstral coefficients for
21. 21
noisy speech recognition. Proc. of Int. Conf. Spoken Language Processing (ICSLP). Yokohama,
Japan. pp. 1011-1014.
Lee, L. M., Wang, H. C., 1995. Representation of hidden Markov model for noise adaptive speech
recognition. Electronics Letters, Vol. 31, No. 8, pp. 616-617.
Mansour, D., Juang, B. H., 1989. A family of distortion measures based upon projection operation for
robust speech recognition. IEEE Trans. on Acoustics, Speech, Sig5nal Processing, Vol. 37, pp.
1659-1671.
Rabiner, L., and Juang, B. H., 1993. Fundamentals of Speech Recognition, Englewood Cliffs, New
Jersey, Prentice-Hall, pp. 112-117.
Sankar, A., Lee, C. H., 1996. A maximum-likelihood approach to stochastic matching for robust speech
recognition. IEEE Trans. on Speech and Audio Processing, Vol. 4, pp. 190-202.
Varga, A. P., Moore, R. K., 1992a. Hidden Markov model decomposition of speech and noise. IEEE
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), San Francisco. pp. 845-848.
Varga, A. P., Steeneken, H.J.M., Tomlinson, M., Jones, D., 1992b. The NOISEX-92 study on the
effect of additive noise on automatic speech recognition, Technical Report, DRA Speech Research
Unit, Malvern, England.
22. 22
Table 1. Comparison of digit recognition rates (%) for white noise.
( 0.0min =δ , 0.1max =δ , SNR dBLB = 20 , SNR dBUB = 30 .)
SNRs
Methods
clean 20 dB 15 dB 10 dB 5 dB 0 dB
baseline 98.9 80.2 65.7 48.8 25.6 10.6
PMC 98.7 92.2 84.6 72.7 59.3 47.1
hard limiter 90.6 76.8 68.5 55.8 35.9 21.4
adaptive limiter 95.2 85.1 76.4 68.1 58.5 49.7
Table 2. Comparison of digit recognition rates (%) for factory noise.
( 0.0min =δ , 0.1max =δ , dBSNRLB 10= , dBSNRUB 40= .)
SNRs
Methods
clean 20 dB 15 dB 10 dB 5 dB 0 dB
baseline 98.9 91.2 81.4 65.9 46.9 25.4
PMC 98.7 95.0 91.8 82.3 73.2 52.5
hard limiter 90.6 86.3 80.2 71.3 57.5 30.0
adaptive limiter 94.9 91.9 87.8 77.7 69.2 53.3
Table 3. Comparison of digit recognition rates (%) for F16 noise.
( 0.0min =δ , 0.1max =δ , SNR dBLB = 15 , SNR dBUB = 35 .)
SNRs
Methods
clean 20 dB 15 dB 10 dB 5 dB 0 dB
baseline 98.9 91.1 78.9 65.2 43.9 21.0
PMC 98.7 95.9 92.5 87.4 68.1 44.5
hard limiter 90.6 84.9 77.1 67.8 54.6 29.4
adaptive limiter 95.1 91.4 85.3 78.7 61.9 42.2
Table 4. Comparison of computation costs based on Pentium II-266 MHz
Personal Computer.
Methods baseline PMC hard limiter adaptive limiter
CPU Time
recognition
(sec)
0.203 4.038 0.291 2.981
23. 23
δ
(model adaptation)
δ
Fig. 1. Block diagram for implementing a speech recognizer with adaptive signal limiter.
0 375 750 1125 1500 1875 2250 2625 3000 3375 3750
1
6
11
-2
-1
0
1
2
3
4
magnitude(dB)
frequency (Hz)
frame index
baseline-clean
(a) LPC log magnitude spectra without signal limiter.
word
models
segmental
k-means
arcsin
transform
autocorr.→LPC
LPC→cepstrum
estimate smoothing
factor δ
autocorr. vectors
of a context window
arcsin
transform
speech
recognizer
training
utterances
autocorr. vectors
of context windows
find smoothed delta cepstrum
and covariance matrix
testing
utterances
autocorr.→LPC
LPC→cepstrum
find smoothed
delta cepstrum
recognition
results
24. 24
0 500 1000 1500 2000 2500 3000 3500
1
7
13
-0.5
0
0.5
1
1.5
2magnitude(dB)
frequency (Hz)
frame index
hard limiter-clean
(b) LPC log magnitude spectra with hard limiter.
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
magnitude(dB)
frequency (Hz)
frame index
adaptive limiter-clean
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 2. The various LPC log magnitude spectra of utterance ‘1’ in clean condition.
25. 25
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
magnitude(dB)
frequency (Hz)
frame index
baseline-white20dB
(a) LPC log magnitude spectra without signal limiter.
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-0.5
0
0.5
1
1.5
2
magnitude(dB)
frequency (Hz)
frame index
hard limiter-white20dB
(b) LPC log magnitude spectra with hard limiter.
26. 26
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-1
-0.5
0
0.5
1
1.5
2
2.5
magnitude(dB)
frequency (Hz)
frame index
adaptive limiter-white20dB
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 3. The various LPC log magnitude spectra of utterance ‘1’ distorted by 20 dB white noise.
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
magnitude(dB)
frequency (Hz)
frame index
baseline-factory20dB
(a) LPC log magnitude spectra without signal limiter.
27. 27
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-0.5
0
0.5
1
1.5
2
magnitude(dB)
frequency (Hz)
frame index
hard limiter-factory20dB
(b) LPC log magnitude spectra with hard limiter.
0 500 1000 1500 2000 2500 3000 3500
1
6
11
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
magnitude(dB)
frequency (Hz)
frame index
adaptive limiter-factory20dB
(c) LPC log magnitude spectra with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 10 , SNR dBUB = 40 .)
Fig. 4. The various LPC log magnitude spectra of utterance ‘1’ distorted by 20 dB factory noise.
28. 28
word '1' in white noise
using baseline system
0
100
200
300
400
500
600
700
800
900
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1 model 2
model 3 model 4 model 5
model 6 model 7 model 8
model 9
(a) Comparison of average log likelihoods without signal limiter.
word '1' in white noise
using hard limiter
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1 model 2
model 3 model 4 model 5
model 6 model 7 model 8
model 9
(b) Comparison of average log likelihoods with hard limiter.
29. 29
word '1' in white noise
using adaptive signal limiter
600
700
800
900
1000
1100
1200
1300
1400
1500
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1
model 2 model 3
model 4 model 5
model 6 model 7
model 8 model 9
(c) Comparison of average log likelihoods with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 20 , SNR dBUB = 30 .)
Fig. 5. The average log likelihoods of utterance ‘1’ evaluated on various
word models in white noise.
word '1' in factory noise
using baseline system
200
300
400
500
600
700
800
900
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1
model 2 model 3
model 4 model 5
model 6 model 7
model 8 model 9
(a) Comparison of average log likelihoods without signal limiter.
30. 30
word '1' in factory noise
using hard limiter
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1 model 2
model 3 model 4 model 5
model 6 model 7 model 8
model 9
(b) Comparison of average log likelihoods with hard limiter.
word '1' in factory noise
using adaptive signal limiter
600
800
1000
1200
1400
1600
0dB 5dB 10dB 15dB 20dB clean
SNR values
loglikelihoods
model 0 model 1
model 2 model 3
model 4 model 5
model 6 model 7
model 8 model 9
(c) Comparison of average log likelihoods with adaptive signal limiter.
(δmin .= 0 0 , δmax .= 1 0 , SNR dBLB = 10 , SNR dBUB = 40 .)
Fig. 6. The average log likelihoods of utterance ‘1’ evaluated on various
word models in factory noise.