This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
2. Outline
1. Research background
2. What is musical-noise-free?
3. Conventional statistical-model-based
speech enhancement
4. Proposed method and analysis
5. Experimental evaluation
6. Conclusion
2
3. Research Background and Goal
Single-channel speech enhancement
Spectral subtraction (SS) [Boll, 1979], Wiener Filtering,
Bayesian minimum mean-square error short-time
spectral amplitude (MMSE-STSA) estimator [Ephraim,
1984], MAP estimator [Lotter, 2005], etc.
Harmful distortion owing to musical noise generation
Musical-noise-free speech enhancement
[Miyazaki, Saruwatari et al., IEEE Trans. ASLP 2012]
Noise reduction without any musical noise
We have found that SS (maximum-likelihood amplitude
estimator) has musical-noise-free state.
Whether or not Generalized Bayesian MMSE-STSA
estimator has musical-noise-free state?
3
4. Relation between Musical Noise and Kurtosis
4
Proportional relation
between human perception
(musical noise score) and
log kurtosis ratio
[Saruwatari, 2008]
7. MOSIE (generalized MMSE-STSA) Estimator
7
Statistical speech amplitude estimator with parametric
speech prior [Breithaupt, et al., IEEE Trans. 2011]
8. How to Generate Musical-Noise-Free State?
8
Unfortunately we cannot find any
musical-noise-free states in the
conventional MOSIE estimator.
No intersection!
Forgetting factor a
is increasing
12. Calculation of Moment for Biased MOSIE (3/4)
12
3. Moment-cumulant transformation for
4. Cumulant of noise power spectrum
13. Calculation of Moment for Biased MOSIE (4/4)
13
5. Cumulant-moment transformation for
m1 is used for NRR, and m2 and m4 are used for kurtosis,
which are functions of value of bias e.
15. Experiment 1: Existence of Musical-Noise-Free
15
Noise White Gaussian noise in 0-dB SNR
Speech prior Gaussian model (r = 1)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
To introduce bias ε, we find musical-noise-free state in
statistical-model-based estimator.
e large
16. Experiment 2: Existence of Musical-Noise-Free
16
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
Strong speech prior (small ρ) gives almost no musical-
noise-free state in real processing.
e large
17. Experiment 3: Comparison with Other Methods
17
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
18. Experiment 3: Comparison with Other Methods
18
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Large musical
noise methods
No musical noise methods
19. Experiment 3: Comparison with Other Methods
19
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Lowest
speech
distortion
Large musical
noise methods
No musical noise methods
Richer speech prior
20. Conclusion
To introduce bias ε, we find musical-noise-free
state in Bayesian estimator.
Proposed biased MOSIE estimator can achieve
better cepstral distortion whereas its kurtosis ratio
is perfectly fixed to 1.0.
Strong speech prior (small ρ) gives almost no
musical-noise-free state. So we should carefully
select the appropriate prior to maintain the qualities
of both speech and remaining noise.
20
Thank you for your attention!