Adaptive noise estimation algorithm for speech enhancement


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Adaptive noise estimation algorithm for speech enhancement

  1. 1. Base paper: - Adaptive Noise Estimation Algorithm for Speech Enhancement Institute of Electrical and Electronics Engineering Abstract: A fast and robust speech noise estimation technique is proposed. The noisy speech is composed using a critical-band-rate filter bank so that a perceptual modification of Wiener filtering can be applied in speech denoising. The sub-band noise estimate is updated adaptively using a smoothing parameter that depends on the estimated signal-to-noise ratio (SNR). This noise estimation technique can give accurate results even at very low signal-to noise ratios. Speech denoising using perceptually modified Wiener filtering combined with the proposed noise estimation technique gives enhanced speech of good quality.
  2. 2. Base paper: - A: - (a) Basic Overview of Additive Noise (b) Basic Overview of Speech Enhancement System (c) Overview of Spectral Subtraction System
  3. 3. Base paper: - (d) Two Channel Speech Enhancement (e) Voice Activity Detection (f) Block Diagram of Subspace Speech Enhancement System (g) Block Diagram of Complete Subspace Speech Enhancement with Adaptive Noise Estimation Algorithm
  4. 4. Base paper: - (h) Block Diagram of PESQ Algorithm
  5. 5. Base paper: - B: - Waveforms (a) Original wave, (b) Noisy (Corrupted) wave, and (c) Enhanced wave. C: - Spectrograms Spectrogram of (a) Original wave, (b) Noisy (Corrupted) wave, and (c) Enhanced wave.
  6. 6. Base paper: - Conclusion: This thesis has focused on the design, implementation and testing of an adaptive noise estimation algorithm for signal subspace speech enhancement. This is a novel approach to the subspace method [5] which traditionally uses voice activity detection to estimate the noise in a signal. The proposed method requires no voice activity detection and thus can update the noise estimate throughout the signal instead of being limited to silence intervals. This allows a more accurate noise estimate to be produced and improves the quality of the enhanced speech. Objective and subjective tests were carried out to evaluate the success of the proposed algorithm. The results were compared with those of contemporary speech enhancement systems and were shown to outperform these systems for the majority of situations. The proposed algorithm was shown to produce good quality speech in most noise types even at low signal to noise ratios. The proposed system has potential applications in cellular telephony, audio archive restoration and automatic speech recognition. All of these applications are heavily reliant on accurate and robust noise estimation to provide high quality enhanced speech. Thus the proposed method is an ideal speech enhancement algorithm for these situations. Future Work: Recent developments is subspace based speech enhancement, such as Klein and Kabal’s perceptual post filter [22], and the work of Jabloun and Champagne in [34] have involved the exploitation of auditory masking properties. The algorithm in this paper does not make use of these properties but they could be incorporated relatively easily. This could potentially result in a further increase in system performance. The subspace method is also rather computationally complex. Future work should also focus on the reduction of this complexity. The discrete cosine transform was been proposed as an alternative to the computationally complex KLT transform, and single value decomposition is another option for reducing complexity. This will be significant as speech enhancement algorithms require in real-time implementation for some applications, with more efficient algorithms allowing less power consumption and processor usage.
  7. 7. Base paper: - References: Ambikairajah, E., Epps, J. and Lin, L. (2001). Wideband speech and audio coding using Gamma tone filter banks. Proc. ICASSP, pp. 773-776. Brandenburg, K.B. and Stoll, G.(1994). ISO-MPEG-1 audio: A generic standard for coding of high-quality digital audio. Journal of the Audio Engineering Society, 42 (10) 780-792. Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in sub- bands. Proc. EUROSPEECH'95, Madrid, pp 1513-1516. Gustafsson, S., Jax, P. and Vary, P. (1998). A novel psychoacoustically motivated audio enhancement algorithm preserving background noise characteristics. Proc. ICASSP, pp. 397-400. Lim, J.S. and Oppenheim, A.V. (1979). Enhancement and bandwidth compression of noisy speech. Proc. of IEEE, 67 (12) 1586-1604. Lin, L., Ambikairajah, E. and Holmes, W.H. (2001). Auditory filterbank design using masking curves. Proc. EUROSPEECH, Aalborg, pp. 411-414. Lin, L., Holmes, W.H. and Ambikairajah, E. (2002). Speech enhancement based on a perceptual modification of Wiener filtering. Proc. ICSLP, Denver, pp. 781-784. Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing. 9 (5) 504-512. Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing. 7 (2) 126-82 Additional references: [1] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-27, Apr. 1979 [2] History of Automatic Speech Synthesis and Recognition [3] Audio Demonstration: Speech Enhancement for Electronic Hearing Aids [4] S. J. Godsill, P. J. Wolfe, and W. N. W. Fong, “Statistical model-based approaches to audio restoration and analysis”. Journal of New Music Research, 30(4):323-338, 2001. Special Issue: Conservation, Restoration and Archiving of Electroacoustic Music. [5] Y. Ephraim and H.L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, July 1995 [6] J.S. Lim and A.V. Oppenheim, “Enhancement and bandwidth compression of Noisy Speech,” Proc. IEEE, vol. 67, No. 2, pp. 1586-1604, Dec. 1979 [7] M. Berouti, R Schwartz and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.
  8. 8. Base paper: - IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 208-211, Apr. 1979 [8] Y Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-term spectral amplitude estimator”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-32, No. 6, pp. 1109- 1121, Dec 1984. [9] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and projection, for robust recognition in cars,” Speech Commun., vol. 11, pp. 215-228, June 1992. [10] N Virag, “Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System,” IEEE Trans. On Speech and Audio Processing, vol. 7, No. 2, March 1999. [11] K.Brandenburg, G.Stoll, et al., "The ISO/MPEG-Audio Codec: A Generic Standard for Coding of High Quality Digital Audio," 92nd AES-Convention, preprint 3336, Vienna 1992 [12] S.F. Boll and D.C. Pulsipher, “Suppression of acoustic noise in speech using two microphone adaptive noise cancellation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-28, pp. 752-753, Dec. 1980 [13] M. Dorbecker, S. Ernst, “Combination of Two-Channel Spectral Subtraction and Adaptive Wiener Post-Filtering for Noise Reduction and Dereverberation,” [14] L.R. Rabiner and M.R. Sambur, “An algorithm for determining the Endpoint of Isolated Utterances,” The Bell Systems Technical journal, Vol. 54, No.2, pp.297-315, February 1975 [15] R. Martin, “Spectral Subtraction based on Minimum Statistics,” Proc. EUSIPCO, pp. 1182-11185, 1994. [16] G. Doblinger, “Computationally Efficient Speech Enhancement By Spectral Minima Tracking in Subbands,” Proc. EuroSpeech, vol. 2, pp 1513- 1516, 1995. [17] R. Martin, “Noise Power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, July 2001 [18] S. Rangachari, P.C. Loizou and Y. Hu, “A Noise Estimation Algorithm with Rapid Adaptation for Highly Non-Stationary Environments,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. I-305-I-308, May 2004 [19] L. Lin, W.H. Holmes and E. Ambikairajah, “Subband noise estimation for speech enhancement using a perceptual wiener filter,” Proc. IEEE Int. Conf. on Acoustics, Speech and Audio Processing, pp. I_80 – I_83, 2003 [20] I. Cohen and B. Berdugo, “Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement,” IEEE Signal Processing Letters, vol. 9, no. 1, pp 12-15, Jan 2002 [21] Y. Bresler and A. Mackovski, “Exact Maximum Likelihood Parameter Estimation of Superimposed Exponential Signals in Noise” IEEE Trans On Acoustics, Speech and Signal Processing, vol. ASSP-34, no. 5, pp. 1081-1089, Oct 1986. [22] M. Klein and P. Kabal, “Signal Subspace Speech enhancement with perceptual post filtering,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. I-537-I-540, May 2002
  9. 9. Base paper: - [23] N. Merhav, “The Estimation of Model Order in Exponential Families,” IEEE Trans. Inform. Theory. vol. 35, pp. 1109-1114, Sept. 1989 [24] S. Gazor and A. Rezayee, “An adaptive KLT approach for Speech Enhancement,” IEEE Trans. on Speech and Audio Processing, vol. 9, pp. 97- 95, Feb. 2001 [25] E. Wan, A. Nelson, and Rick Peterson, Speech Enhancement Assessment Resource (SpEAR) Database [26] Noisex-92 database, taken from Signal Processing information base website: [27] “Subjective Performance Assessment of Telephone-Band Wideband Digital Codecs,” recommendation ITU-T P.830, International Telecommunication Union, Feb 1996 [28] “Perceptual Evaluation of Speech Quality (PESQ),” recommendation ITU-T P.862, International Telecommunication Union, Feb. 01 [29]M. Klein, “Signal Subspace Speech Enhancement with Perceptual Post-Filtering,” Master’s Thesis, McGill University, Montreal, Canada, 2002 [30] N. Ma, M. Bouchard, R.A. Goubran, “Perceptual Kalman Filtering for Speech Enhancement in Colored Noise,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, May 2004 [31] I. Cohen, “Speech Enhancement using a non-causal a priori SNR Estimator”, IEE Signal Processing Letters, vol. 11, no. 9, September 2004 [32]T.S. Gunawan, E. Ambikairajah, “Speech Enhancement using Temporal Masking and Fractional Bark Gammatone Filters,” Proc. 10th Australian International Conference on Speech Science and Technology, Dec 2004 [33] Opticom website PESQ description: [34] F. Jabloun and B. Champagne, “Incorporating the Human Hearing Properties in the Signal Subspace Approach for Speech Enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 11, No.6, Nov 2003