SlideShare a Scribd company logo
1 of 43
Improved Cepstra Minimum-Mean-
Square-Error Noise Reduction Algorithm
for Robust Speech Recognition
Jinyu Li, Yan Huang, Yifan Gong
Microsoft
Robust Front-End
• Conventional noise-robust front-ends work very well with the Gaussian
mixture models (GMMs).
• Single-channel robust front-ends were reported not helpful to multi-style
deep neural network (DNN) training.
• DNN’s layer-by-layer structure provides a feature extraction strategy that automatically
derives powerful noise-resistant features
t-SNE Plot for Paired Clean and Noisy
Utterances in Training Set (LFB)
t-SNE Plot for Paired Clean and Noisy
Utterances in Training Set (Layer 1)
t-SNE Plot for Paired Clean and Noisy
Utterances in Training Set (Layer 3)
t-SNE Plot for Paired Clean and Noisy
Utterances in Training Set (Layer 5)
t-SNE Plot for Paired Clean and Noisy
Utterances in Testing Set (Layer 1)
t-SNE Plot for Paired Clean and Noisy
Utterances in Testing Set (Layer 3)
t-SNE Plot for Paired Clean and Noisy
Utterances in Testing Set (Layer 5)
Robust Front-End
• Conventional noise-robust front-ends work very well with the Gaussian
mixture models (GMMs).
• Single-channel robust front-ends were reported not helpful to multi-style
deep neural network (DNN) training although multi-channel signal
processing still helps.
• In this study, we show that the single-channel robust front-end is still
beneficial to deep learning models as long as it is well designed.
Cepstra Minimum Mean Square Error
(CMMSE)
• Reported very effective in dealing with noise when used in the GMM-based
acoustic models
• The solution to CMMSE for each element of the dimension-wise MFCC:
𝑐 𝑥 𝑡, 𝑘 = 𝐸 𝑐 𝑥 𝑡, 𝑘 𝒎 𝑦 𝑡 = 𝑏 𝑎 𝑘,𝑏 𝐸 log 𝑚 𝑥 𝑡, 𝑏 |𝒎 𝑦(𝑡)
• The problem is reduced to finding the log-MMSE estimator of the Mel
filter-bank’s output: 𝑚 𝑥 𝑡, 𝑏 ≈ exp 𝐸 log 𝑚 𝑥 𝑡, 𝑏 |𝑚 𝑦(𝑡, 𝑏) given a
weak independent assumption between Mel-filterbanks.
4-Step Processing of CMMSE
• Voice activity detection (VAD): detects the speech probability at every time-
filterbank bin;
• Noise spectrum estimation: uses the estimated speech probability to update
the estimation of noise spectrum;
• Gain estimation: uses the noisy speech spectrum and the estimated noise
spectrum to calculate the gain of every time-filterbank bin;
• Noise reduction: applies the estimated gain to the noisy speech spectrum to
generate the clean spectrum.
CMMSE
spectrum
calculation
cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
VAD
(MCRA)
filter-bank gain
estimation
noise spectrum
estimation
(MCRA)
VAD
• CMMSE uses a minimum controlled recursive moving average (MCRA)
noise tracker (Cohen and Berdugo, 2002) to detect the speech probability
𝑝 𝑡, 𝑏 in each filterbank bin b and time t.
• I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging
for robust speech enhancement,” IEEE signal processing letters, 9(1), pp.12-15, 2002.
Noise Spectrum Estimation
• The noise power spectrum 𝑚 𝑛 𝑡, 𝑏 is estimated using MCRA(Cohen and
Berdugo, 2002) as
𝑚 𝑛 𝑡, 𝑏 = 𝛼 ∗ 𝑚 𝑛 𝑡 − 1, 𝑏 + 1.0 − 𝛼 ∗ 𝑚 𝑦 𝑡, 𝑏
with
𝛼 = 𝛼 𝐷 + 1.0 − 𝛼 𝐷 ∗ 𝑝 𝑡, 𝑏
Gain Estimation
• The gain of time-filterbank bin 𝐺 𝑡, 𝑏 =
𝜉(𝑡,𝑏)
1+ 𝜉(𝑡,𝑏)
exp
1
2 𝑣 𝑡,𝑏
∞ 𝑒−𝜏
𝜏
𝑑𝜏 .
• posterior SNR : 𝛾 𝑡, 𝑏 =
𝑚 𝑦(𝑡,𝑏)
𝑚 𝑛(𝑡,𝑏)
• prior SNR is calculated using a decision-directed approach (DDA) 𝜉 𝑡, 𝑏 = 𝛽 ∗
𝐺 𝑡 − 1, 𝑏 ∗ 𝛾 𝑡 − 1, 𝑏 + 1.0 − 𝛽 ∗ 𝜉 𝑡, 𝑏
• 𝜉 𝑡, 𝑏 = max(𝛾 𝑡, 𝑏 − 1, 0.0)
• 𝑣 𝑡, 𝑏 =
𝜉(𝑡,𝑏)
1+ 𝜉(𝑡,𝑏)
𝛾 𝑡, 𝑏
Noise Reduction
• 𝑚 𝑥 𝑡, 𝑏 ≈ 𝐺 𝑡, 𝑏 𝑚 𝑦 𝑡, 𝑏
CMMSE
spectrum
calculation
cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
VAD
(MCRA)
filter-bank gain
estimation
noise spectrum
estimation
(MCRA)
Speech Probability Estimation
• Reliable speech probability estimation is critical to the estimation of noise
spectrum: 𝑚 𝑛 𝑡, 𝑏 = 𝛼 ∗ 𝑚 𝑛 𝑡 − 1, 𝑏 + 1.0 − 𝛼 ∗ 𝑚 𝑥 𝑡, 𝑏 .
• We use IMCRA (Cohen 2003) to estimate the speech probability 𝑝 𝑡, 𝑏 in
each time-filterbank bin.
• I. Cohen, "Noise spectrum estimation in adverse environments: improved minima
controlled recursive averaging," IEEE Trans. Speech and Audio Processing, Vol. 11, No. 5,
pp. 466-475, 2003.
Improving CMMSE
spectrum
calculation
cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
VAD
(IMCRA)
filter-bank gain
estimation
noise spectrum
estimation
(IMCRA)
Refined Prior SNR Estimation
• We do further gain estimation to get a converged gain
• 𝜉 𝑡, 𝑏 = 𝐺 𝑡, 𝑏 ∗ 𝛾 𝑡, 𝑏
• 𝑣 𝑡, 𝑏 =
𝜉(𝑡,𝑏)
1+ 𝜉(𝑡,𝑏)
𝛾 𝑡, 𝑏
• 𝐺 𝑡, 𝑏 =
𝜉(𝑡,𝑏)
1+ 𝜉(𝑡,𝑏)
exp
1
2 𝑣 𝑡,𝑏
∞ 𝑒−𝜏
𝜏
𝑑𝜏
Improving CMMSE
spectrum
calculation
cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
VAD
(IMCRA)
filter-bank gain
estimation
noise spectrum
estimation
(IMCRA)
OMLSA
• Optimally-modified log-spectral amplitude (OMLSA) speech estimator
(Cohen and Berdugo, 2001) is used to modify time-filterbank gain:
𝐺 𝑡, 𝑏 = 𝐺 𝑡, 𝑏 𝑝 𝑡,𝑏 𝐺0
1−𝑝 𝑡,𝑏
• I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise
environments," Signal Processing, Vol. 81, No. 11, pp. 2403-2418, 2001.
Gain Smoothing
• It is better to use OMLSA when the speech probability estimation is reliable.
• We do cross filterbank gain smoothing in this stage to partially address the
weak independent assumption of Mel-filterbanks.
𝐺 𝑡, 𝑏 = ( 𝐺 𝑡, 𝑏 − 1 + 𝐺 𝑡, 𝑏 + 𝐺(𝑡, 𝑏 + 1))/3
Improving CMMSE
spectrum
calculation
cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
filter-bank gain
smoothing
VAD
(IMCRA)
filter-bank gain
estimation
noise spectrum
estimation
(IMCRA)
Two-stage Processing
• The noise reduction process is not perfect due to the factors such as
imperfect noise estimation.
• A second stage noise reduction can be used to further reduce the noise.
• We use OMLSA together with gain smoothing for the gain modification in
the second stage because the residual noise has less impact to speech
probability estimation after the first stage noise reduction
Improved Cepstra Minimum Mean Square Error
(ICMMSE)
spectrum
calculation
1st
stage cleaned filter-bank spectrum
Input
signal
Mel filtering noise reduction
filter-bank gain
smoothing
VAD
(IMCRA)
filter-bank gain
estimation
noise spectrum
estimation
(IMCRA)
noise reduction
filter-bank gain
OMLSA+smoothing
noise spectrum
estimation
(IMCRA)
VAD
(IMCRA)
filter-bank gain
estimation
final cleaned filter-bank spectrum
Experiments
Task Training Data Acoustic Model
Aurora 2 8440 utterances GMM
Experiments
Task Training Data Acoustic Model
Aurora 2 8440 utterances GMM
Chime 3 1600 real utterances
+ 7138 simulated utterances
feed-forward DNN
Experiments
Task Training Data Acoustic Model
Aurora 2 8440 utterances GMM
Chime 3 1600 real utterances
+ 7138 simulated utterances
feed-forward DNN
Cortana 3400hr Live data LSTM-RNN
Aurora 2
Baseline CMMSE 1-stage
ICMMSE
2-stage
ICMMSE
Clean 1.39 1.48 1.20 1.10
20db 2.69 2.21 1.99 1.85
15db 3.6 3.21 2.91 2.71
10db 6.04 5.65 5.30 4.97
5db 14.38 13.01 12.59 11.57
0db 43.41 38.36 34.02 32.06
-5db 75.93 72.8 68.47 67.45
Avg. (0-20
db)
13.67 12.17 10.87 10.19
-10.00
-5.00
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
Clean 20db 15db 10db 5db 0db -5db Avg. (0-20
db)
Relative WER reduction from baseline
CMMSE 1-stage ICMMSE 2-stage ICMMSE
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
+ IMCRA 11.66
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
+ IMCRA 11.66
+OMLSA 10.99
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
+ IMCRA 11.66
+OMLSA 10.99
+refined prior SNR 10.87
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
+ IMCRA 11.66
+OMLSA 10.99
+refined prior SNR 10.87
-OMLSA +gain smoothing
(1-stage ICMMSE)
10.74
Improvement Breakdown
Method Avg. WER
Baseline 13.67
CMMSE 12.17
+ IMCRA 11.66
+OMLSA 10.99
+refined prior SNR 10.87
-OMLSA +gain smoothing
(1-stage ICMMSE)
10.74
+2nd stage
processing
(2-stage ICMMSE)
10.19
Chime 3
Model FE Test Real Simulate
Baseline Clean 7.56 N/A
CMMSE Clean 7.64 N/A
ICMMSE Clean 7.41 N/A
Baseline Noisy 18.95 18.18
CMMSE Noisy 17.32 16.76
ICMMSE Noisy 16.73 15.79 -2.00
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
Clean Real Noisy Simu Noisy
Relative WER reduction from Baseline
CMMSE ICMMSE
Chime 3 WER Breakdown with Real and Simu.
Noisy Test Set
Cortana
WER Baseline CMMSE ICMMSE
20db above 13.17 12.64 12.71
10-20db 20.8 19.83 18.51
0-10db 27.03 26 24.71
0.00
2.00
4.00
6.00
8.00
10.00
12.00
20db above 10-20db 0-10db
Relative WER reduction from Baseline
CMMSE ICMMSE
Conclusion
• A new robust front-end called ICMMSE is proposed to improve the previous
CMMSE front-end with several advanced components
• The IMCRA algorithm helps to generate more accurate speech probability.
• The refined prior SNR estimation helps to get a converged gain.
• Either cross filterbank gain smoothing or OMLSA is helpful to further modify the gain
function.
• The two-stage processing helps to reduce the residual noise after the first-stage processing.
• ICMMSE is superior regardless of the underlying models and evaluation tasks
Result Summary
Task Training Data Acoustic Model Relative
WER
reduction
Aurora 2 8440 utterances GMM 25.46%
Chime 3 1600 real utterances
+ 7138 simulated utterances
feed-forward DNN 11.98%
Cortana 3400hr Live data LSTM-RNN 11.01%
Thank You

More Related Content

What's hot

Digital communication
Digital communicationDigital communication
Digital communicationmeashi
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domainAshish Kumar
 
Frequency Image Processing
Frequency Image ProcessingFrequency Image Processing
Frequency Image ProcessingSuhas Deshpande
 
Development of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the InternetDevelopment of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the InternetTakashi Kishida
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingAmr E. Mohamed
 
Allaboutequalizers
AllaboutequalizersAllaboutequalizers
Allaboutequalizersdouglaslyon
 
Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Prateek Omer
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Shajun Nisha
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignAmr E. Mohamed
 
Pulse Code Modulation
Pulse Code Modulation Pulse Code Modulation
Pulse Code Modulation ZunAib Ali
 
Junaid Malik - Formal Element
Junaid Malik - Formal ElementJunaid Malik - Formal Element
Junaid Malik - Formal ElementJunaid Malik
 
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...IJECEIAES
 
solution manual of goldsmith wireless communication
solution manual of goldsmith wireless communicationsolution manual of goldsmith wireless communication
solution manual of goldsmith wireless communicationNIT Raipur
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulationavocado1111
 

What's hot (20)

Digital communication
Digital communicationDigital communication
Digital communication
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domain
 
Frequency Image Processing
Frequency Image ProcessingFrequency Image Processing
Frequency Image Processing
 
Development of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the InternetDevelopment of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the Internet
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
 
Allaboutequalizers
AllaboutequalizersAllaboutequalizers
Allaboutequalizers
 
Thesis Overview
Thesis Overview Thesis Overview
Thesis Overview
 
Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)
 
Unit i-pcm-vsh
Unit i-pcm-vshUnit i-pcm-vsh
Unit i-pcm-vsh
 
cr1503
cr1503cr1503
cr1503
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
Pulse Code Modulation
Pulse Code Modulation Pulse Code Modulation
Pulse Code Modulation
 
Junaid Malik - Formal Element
Junaid Malik - Formal ElementJunaid Malik - Formal Element
Junaid Malik - Formal Element
 
Equalisers
EqualisersEqualisers
Equalisers
 
Pcm
PcmPcm
Pcm
 
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...
Minimize MIMO OFDM interference and noise ratio using polynomial-time algorit...
 
solution manual of goldsmith wireless communication
solution manual of goldsmith wireless communicationsolution manual of goldsmith wireless communication
solution manual of goldsmith wireless communication
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
 

Similar to Icmmse slides

Introduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.pptIntroduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.pptdebeshidutta2
 
Support Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationSupport Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationShamman Noor Shoudha
 
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerAcoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerRanbeer Tyagi
 
Learning the mmse channel estimators
Learning the mmse channel estimatorsLearning the mmse channel estimators
Learning the mmse channel estimatorsShamman Noor Shoudha
 
Channel Equalisation
Channel EqualisationChannel Equalisation
Channel EqualisationPoonan Sahoo
 
A Decisive Filtering Selection Approach For Improved Performance Active Noise...
A Decisive Filtering Selection Approach For Improved Performance Active Noise...A Decisive Filtering Selection Approach For Improved Performance Active Noise...
A Decisive Filtering Selection Approach For Improved Performance Active Noise...IOSR Journals
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love BucharestStefan Adam
 
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...IJERA Editor
 
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Ankit Shah
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討Tomoki Koriyama
 
Active noise control real time demo
Active noise control real time demoActive noise control real time demo
Active noise control real time demoSrikanth Konjeti
 
Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...shaik chand basha
 
the generation of panning laws for irregular speaker arrays using heuristic m...
the generation of panning laws for irregular speaker arrays using heuristic m...the generation of panning laws for irregular speaker arrays using heuristic m...
the generation of panning laws for irregular speaker arrays using heuristic m...Bruce Wiggins
 
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionTakuya Yoshioka
 

Similar to Icmmse slides (20)

Introduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.pptIntroduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.ppt
 
Support Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationSupport Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear Equalization
 
Mp3
Mp3Mp3
Mp3
 
Antinoise system & Noise Cancellation
Antinoise system & Noise CancellationAntinoise system & Noise Cancellation
Antinoise system & Noise Cancellation
 
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerAcoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
 
Learning the mmse channel estimators
Learning the mmse channel estimatorsLearning the mmse channel estimators
Learning the mmse channel estimators
 
Channel Equalisation
Channel EqualisationChannel Equalisation
Channel Equalisation
 
A Decisive Filtering Selection Approach For Improved Performance Active Noise...
A Decisive Filtering Selection Approach For Improved Performance Active Noise...A Decisive Filtering Selection Approach For Improved Performance Active Noise...
A Decisive Filtering Selection Approach For Improved Performance Active Noise...
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
 
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...
Analysis of Non Linear Filters with Various Density of Impulse Noise for Diff...
 
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
Symposium Presentation
Symposium PresentationSymposium Presentation
Symposium Presentation
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
 
Active noise control real time demo
Active noise control real time demoActive noise control real time demo
Active noise control real time demo
 
Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...
 
the generation of panning laws for irregular speaker arrays using heuristic m...
the generation of panning laws for irregular speaker arrays using heuristic m...the generation of panning laws for irregular speaker arrays using heuristic m...
the generation of panning laws for irregular speaker arrays using heuristic m...
 
Apsipa2016for ss
Apsipa2016for ssApsipa2016for ss
Apsipa2016for ss
 
Equalization.pdf
Equalization.pdfEqualization.pdf
Equalization.pdf
 
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognition
 

Recently uploaded

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 

Recently uploaded (20)

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 

Icmmse slides

  • 1. Improved Cepstra Minimum-Mean- Square-Error Noise Reduction Algorithm for Robust Speech Recognition Jinyu Li, Yan Huang, Yifan Gong Microsoft
  • 2. Robust Front-End • Conventional noise-robust front-ends work very well with the Gaussian mixture models (GMMs). • Single-channel robust front-ends were reported not helpful to multi-style deep neural network (DNN) training. • DNN’s layer-by-layer structure provides a feature extraction strategy that automatically derives powerful noise-resistant features
  • 3. t-SNE Plot for Paired Clean and Noisy Utterances in Training Set (LFB)
  • 4. t-SNE Plot for Paired Clean and Noisy Utterances in Training Set (Layer 1)
  • 5. t-SNE Plot for Paired Clean and Noisy Utterances in Training Set (Layer 3)
  • 6. t-SNE Plot for Paired Clean and Noisy Utterances in Training Set (Layer 5)
  • 7. t-SNE Plot for Paired Clean and Noisy Utterances in Testing Set (Layer 1)
  • 8. t-SNE Plot for Paired Clean and Noisy Utterances in Testing Set (Layer 3)
  • 9. t-SNE Plot for Paired Clean and Noisy Utterances in Testing Set (Layer 5)
  • 10. Robust Front-End • Conventional noise-robust front-ends work very well with the Gaussian mixture models (GMMs). • Single-channel robust front-ends were reported not helpful to multi-style deep neural network (DNN) training although multi-channel signal processing still helps. • In this study, we show that the single-channel robust front-end is still beneficial to deep learning models as long as it is well designed.
  • 11. Cepstra Minimum Mean Square Error (CMMSE) • Reported very effective in dealing with noise when used in the GMM-based acoustic models • The solution to CMMSE for each element of the dimension-wise MFCC: 𝑐 𝑥 𝑡, 𝑘 = 𝐸 𝑐 𝑥 𝑡, 𝑘 𝒎 𝑦 𝑡 = 𝑏 𝑎 𝑘,𝑏 𝐸 log 𝑚 𝑥 𝑡, 𝑏 |𝒎 𝑦(𝑡) • The problem is reduced to finding the log-MMSE estimator of the Mel filter-bank’s output: 𝑚 𝑥 𝑡, 𝑏 ≈ exp 𝐸 log 𝑚 𝑥 𝑡, 𝑏 |𝑚 𝑦(𝑡, 𝑏) given a weak independent assumption between Mel-filterbanks.
  • 12. 4-Step Processing of CMMSE • Voice activity detection (VAD): detects the speech probability at every time- filterbank bin; • Noise spectrum estimation: uses the estimated speech probability to update the estimation of noise spectrum; • Gain estimation: uses the noisy speech spectrum and the estimated noise spectrum to calculate the gain of every time-filterbank bin; • Noise reduction: applies the estimated gain to the noisy speech spectrum to generate the clean spectrum.
  • 13. CMMSE spectrum calculation cleaned filter-bank spectrum Input signal Mel filtering noise reduction VAD (MCRA) filter-bank gain estimation noise spectrum estimation (MCRA)
  • 14. VAD • CMMSE uses a minimum controlled recursive moving average (MCRA) noise tracker (Cohen and Berdugo, 2002) to detect the speech probability 𝑝 𝑡, 𝑏 in each filterbank bin b and time t. • I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement,” IEEE signal processing letters, 9(1), pp.12-15, 2002.
  • 15. Noise Spectrum Estimation • The noise power spectrum 𝑚 𝑛 𝑡, 𝑏 is estimated using MCRA(Cohen and Berdugo, 2002) as 𝑚 𝑛 𝑡, 𝑏 = 𝛼 ∗ 𝑚 𝑛 𝑡 − 1, 𝑏 + 1.0 − 𝛼 ∗ 𝑚 𝑦 𝑡, 𝑏 with 𝛼 = 𝛼 𝐷 + 1.0 − 𝛼 𝐷 ∗ 𝑝 𝑡, 𝑏
  • 16. Gain Estimation • The gain of time-filterbank bin 𝐺 𝑡, 𝑏 = 𝜉(𝑡,𝑏) 1+ 𝜉(𝑡,𝑏) exp 1 2 𝑣 𝑡,𝑏 ∞ 𝑒−𝜏 𝜏 𝑑𝜏 . • posterior SNR : 𝛾 𝑡, 𝑏 = 𝑚 𝑦(𝑡,𝑏) 𝑚 𝑛(𝑡,𝑏) • prior SNR is calculated using a decision-directed approach (DDA) 𝜉 𝑡, 𝑏 = 𝛽 ∗ 𝐺 𝑡 − 1, 𝑏 ∗ 𝛾 𝑡 − 1, 𝑏 + 1.0 − 𝛽 ∗ 𝜉 𝑡, 𝑏 • 𝜉 𝑡, 𝑏 = max(𝛾 𝑡, 𝑏 − 1, 0.0) • 𝑣 𝑡, 𝑏 = 𝜉(𝑡,𝑏) 1+ 𝜉(𝑡,𝑏) 𝛾 𝑡, 𝑏
  • 17. Noise Reduction • 𝑚 𝑥 𝑡, 𝑏 ≈ 𝐺 𝑡, 𝑏 𝑚 𝑦 𝑡, 𝑏
  • 18. CMMSE spectrum calculation cleaned filter-bank spectrum Input signal Mel filtering noise reduction VAD (MCRA) filter-bank gain estimation noise spectrum estimation (MCRA)
  • 19. Speech Probability Estimation • Reliable speech probability estimation is critical to the estimation of noise spectrum: 𝑚 𝑛 𝑡, 𝑏 = 𝛼 ∗ 𝑚 𝑛 𝑡 − 1, 𝑏 + 1.0 − 𝛼 ∗ 𝑚 𝑥 𝑡, 𝑏 . • We use IMCRA (Cohen 2003) to estimate the speech probability 𝑝 𝑡, 𝑏 in each time-filterbank bin. • I. Cohen, "Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging," IEEE Trans. Speech and Audio Processing, Vol. 11, No. 5, pp. 466-475, 2003.
  • 20. Improving CMMSE spectrum calculation cleaned filter-bank spectrum Input signal Mel filtering noise reduction VAD (IMCRA) filter-bank gain estimation noise spectrum estimation (IMCRA)
  • 21. Refined Prior SNR Estimation • We do further gain estimation to get a converged gain • 𝜉 𝑡, 𝑏 = 𝐺 𝑡, 𝑏 ∗ 𝛾 𝑡, 𝑏 • 𝑣 𝑡, 𝑏 = 𝜉(𝑡,𝑏) 1+ 𝜉(𝑡,𝑏) 𝛾 𝑡, 𝑏 • 𝐺 𝑡, 𝑏 = 𝜉(𝑡,𝑏) 1+ 𝜉(𝑡,𝑏) exp 1 2 𝑣 𝑡,𝑏 ∞ 𝑒−𝜏 𝜏 𝑑𝜏
  • 22. Improving CMMSE spectrum calculation cleaned filter-bank spectrum Input signal Mel filtering noise reduction VAD (IMCRA) filter-bank gain estimation noise spectrum estimation (IMCRA)
  • 23. OMLSA • Optimally-modified log-spectral amplitude (OMLSA) speech estimator (Cohen and Berdugo, 2001) is used to modify time-filterbank gain: 𝐺 𝑡, 𝑏 = 𝐺 𝑡, 𝑏 𝑝 𝑡,𝑏 𝐺0 1−𝑝 𝑡,𝑏 • I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Processing, Vol. 81, No. 11, pp. 2403-2418, 2001.
  • 24. Gain Smoothing • It is better to use OMLSA when the speech probability estimation is reliable. • We do cross filterbank gain smoothing in this stage to partially address the weak independent assumption of Mel-filterbanks. 𝐺 𝑡, 𝑏 = ( 𝐺 𝑡, 𝑏 − 1 + 𝐺 𝑡, 𝑏 + 𝐺(𝑡, 𝑏 + 1))/3
  • 25. Improving CMMSE spectrum calculation cleaned filter-bank spectrum Input signal Mel filtering noise reduction filter-bank gain smoothing VAD (IMCRA) filter-bank gain estimation noise spectrum estimation (IMCRA)
  • 26. Two-stage Processing • The noise reduction process is not perfect due to the factors such as imperfect noise estimation. • A second stage noise reduction can be used to further reduce the noise. • We use OMLSA together with gain smoothing for the gain modification in the second stage because the residual noise has less impact to speech probability estimation after the first stage noise reduction
  • 27. Improved Cepstra Minimum Mean Square Error (ICMMSE) spectrum calculation 1st stage cleaned filter-bank spectrum Input signal Mel filtering noise reduction filter-bank gain smoothing VAD (IMCRA) filter-bank gain estimation noise spectrum estimation (IMCRA) noise reduction filter-bank gain OMLSA+smoothing noise spectrum estimation (IMCRA) VAD (IMCRA) filter-bank gain estimation final cleaned filter-bank spectrum
  • 28. Experiments Task Training Data Acoustic Model Aurora 2 8440 utterances GMM
  • 29. Experiments Task Training Data Acoustic Model Aurora 2 8440 utterances GMM Chime 3 1600 real utterances + 7138 simulated utterances feed-forward DNN
  • 30. Experiments Task Training Data Acoustic Model Aurora 2 8440 utterances GMM Chime 3 1600 real utterances + 7138 simulated utterances feed-forward DNN Cortana 3400hr Live data LSTM-RNN
  • 31. Aurora 2 Baseline CMMSE 1-stage ICMMSE 2-stage ICMMSE Clean 1.39 1.48 1.20 1.10 20db 2.69 2.21 1.99 1.85 15db 3.6 3.21 2.91 2.71 10db 6.04 5.65 5.30 4.97 5db 14.38 13.01 12.59 11.57 0db 43.41 38.36 34.02 32.06 -5db 75.93 72.8 68.47 67.45 Avg. (0-20 db) 13.67 12.17 10.87 10.19 -10.00 -5.00 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 Clean 20db 15db 10db 5db 0db -5db Avg. (0-20 db) Relative WER reduction from baseline CMMSE 1-stage ICMMSE 2-stage ICMMSE
  • 32. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17
  • 33. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17 + IMCRA 11.66
  • 34. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17 + IMCRA 11.66 +OMLSA 10.99
  • 35. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17 + IMCRA 11.66 +OMLSA 10.99 +refined prior SNR 10.87
  • 36. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17 + IMCRA 11.66 +OMLSA 10.99 +refined prior SNR 10.87 -OMLSA +gain smoothing (1-stage ICMMSE) 10.74
  • 37. Improvement Breakdown Method Avg. WER Baseline 13.67 CMMSE 12.17 + IMCRA 11.66 +OMLSA 10.99 +refined prior SNR 10.87 -OMLSA +gain smoothing (1-stage ICMMSE) 10.74 +2nd stage processing (2-stage ICMMSE) 10.19
  • 38. Chime 3 Model FE Test Real Simulate Baseline Clean 7.56 N/A CMMSE Clean 7.64 N/A ICMMSE Clean 7.41 N/A Baseline Noisy 18.95 18.18 CMMSE Noisy 17.32 16.76 ICMMSE Noisy 16.73 15.79 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 Clean Real Noisy Simu Noisy Relative WER reduction from Baseline CMMSE ICMMSE
  • 39. Chime 3 WER Breakdown with Real and Simu. Noisy Test Set
  • 40. Cortana WER Baseline CMMSE ICMMSE 20db above 13.17 12.64 12.71 10-20db 20.8 19.83 18.51 0-10db 27.03 26 24.71 0.00 2.00 4.00 6.00 8.00 10.00 12.00 20db above 10-20db 0-10db Relative WER reduction from Baseline CMMSE ICMMSE
  • 41. Conclusion • A new robust front-end called ICMMSE is proposed to improve the previous CMMSE front-end with several advanced components • The IMCRA algorithm helps to generate more accurate speech probability. • The refined prior SNR estimation helps to get a converged gain. • Either cross filterbank gain smoothing or OMLSA is helpful to further modify the gain function. • The two-stage processing helps to reduce the residual noise after the first-stage processing. • ICMMSE is superior regardless of the underlying models and evaluation tasks
  • 42. Result Summary Task Training Data Acoustic Model Relative WER reduction Aurora 2 8440 utterances GMM 25.46% Chime 3 1600 real utterances + 7138 simulated utterances feed-forward DNN 11.98% Cortana 3400hr Live data LSTM-RNN 11.01%