Statistical Model Based Speech Enhancement without Musical Noise

•

1 like•14,171 views

This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.

Engineering

Statistical-Model-Based Speech Enhancement
with Musical-Noise-Free Properties
Hiroshi Saruwatari
(The University of Tokyo, JAPAN)
IEEE DSP2015 Invited Talk

Outline
1. Research background
2. What is musical-noise-free?
3. Conventional statistical-model-based
speech enhancement
4. Proposed method and analysis
5. Experimental evaluation
6. Conclusion
2

Research Background and Goal
 Single-channel speech enhancement
 Spectral subtraction (SS) [Boll, 1979], Wiener Filtering,
Bayesian minimum mean-square error short-time
spectral amplitude (MMSE-STSA) estimator [Ephraim,
1984], MAP estimator [Lotter, 2005], etc.
 Harmful distortion owing to musical noise generation
 Musical-noise-free speech enhancement
[Miyazaki, Saruwatari et al., IEEE Trans. ASLP 2012]
 Noise reduction without any musical noise
 We have found that SS (maximum-likelihood amplitude
estimator) has musical-noise-free state.
 Whether or not Generalized Bayesian MMSE-STSA
estimator has musical-noise-free state?
3

Relation between Musical Noise and Kurtosis
4
Proportional relation
between human perception
(musical noise score) and
log kurtosis ratio
[Saruwatari, 2008]

Musical-Noise-Free Speech Enhancement
 Iterative noise reduction procedure with musical-noise-
free condition [Miyazaki, Saruwatari, et al., IEEE Trans. ASLP 2012]
6
…

MOSIE (generalized MMSE-STSA) Estimator
7
Statistical speech amplitude estimator with parametric
speech prior [Breithaupt, et al., IEEE Trans. 2011]

How to Generate Musical-Noise-Free State?
8
Unfortunately we cannot find any
musical-noise-free states in the
conventional MOSIE estimator.
No intersection!
Forgetting factor a
is increasing

Calculation of Moment for Biased MOSIE (1/4)
10
1. Derivation of p.d.f.

Calculation of Moment for Biased MOSIE (2/4)
11
2. Calculation of moment for

Calculation of Moment for Biased MOSIE (3/4)
12
3. Moment-cumulant transformation for
4. Cumulant of noise power spectrum

Calculation of Moment for Biased MOSIE (4/4)
13
5. Cumulant-moment transformation for
m1 is used for NRR, and m2 and m4 are used for kurtosis,
which are functions of value of bias e.

Calculation of Moment for Biased MOSIE (4/4)
14
Bias e large

Experiment 1: Existence of Musical-Noise-Free
15
Noise White Gaussian noise in 0-dB SNR
Speech prior Gaussian model (r = 1)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
To introduce bias ε, we find musical-noise-free state in
statistical-model-based estimator.
e large

Experiment 2: Existence of Musical-Noise-Free
16
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
Strong speech prior (small ρ) gives almost no musical-
noise-free state in real processing.
e large

Experiment 3: Comparison with Other Methods
17
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB

Experiment 3: Comparison with Other Methods
18
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Large musical
noise methods
No musical noise methods

Experiment 3: Comparison with Other Methods
19
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Lowest
speech
distortion
Large musical
noise methods
No musical noise methods
Richer speech prior

Conclusion
 To introduce bias ε, we find musical-noise-free
state in Bayesian estimator.
 Proposed biased MOSIE estimator can achieve
better cepstral distortion whereas its kurtosis ratio
is perfectly fixed to 1.0.
 Strong speech prior (small ρ) gives almost no
musical-noise-free state. So we should carefully
select the appropriate prior to maintain the qualities
of both speech and remaining noise.
20
Thank you for your attention!

What's hot

時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元NU_I_TODALAB

HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価Shinnosuke Takamichi

Moment matching networkを用いた音声パラメータのランダム生成の検討Shinnosuke Takamichi

音源分離～DNN音源分離の基礎から最新技術まで～ Tokyo bishbash #3Naoya Takahashi

雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習Shinnosuke Takamichi

独立低ランク行列分析に基づく音源分離とその発展Kitamura Laboratory

Ea2015 7for ssSaruwatariLabUTokyo

深層パーミュテーション解決法の基礎的検討Kitamura Laboratory

DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka

音声の声質を変換する技術とその応用NU_I_TODALAB

独立低ランク行列分析に基づく音源分離とその発展（Audio source separation based on independent low-rank...Daichi Kitamura

Deep Neural Networkに基づく日常生活行動認識における適応手法NU_I_TODALAB

Saito20asj_autumnYuki Saito

音源分離における音響モデリング（Acoustic modeling in audio source separation）Daichi Kitamura

統計的ボイチェン研究事情Shinnosuke Takamichi

音情報処理における特徴表現NU_I_TODALAB

調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離Kitamura Laboratory

音声生成の基礎と音声学Akinori Ito

サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価Shinnosuke Takamichi

音声信号の分析と加工－音声を自在に変換するには？NU_I_TODALAB

What's hot (20)

時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元

HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価

Moment matching networkを用いた音声パラメータのランダム生成の検討

音源分離～DNN音源分離の基礎から最新技術まで～ Tokyo bishbash #3

雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習

独立低ランク行列分析に基づく音源分離とその発展

Ea2015 7for ss

深層パーミュテーション解決法の基礎的検討

DNN音響モデルにおける特徴量抽出の諸相

音声の声質を変換する技術とその応用

独立低ランク行列分析に基づく音源分離とその発展（Audio source separation based on independent low-rank...

Deep Neural Networkに基づく日常生活行動認識における適応手法

Saito20asj_autumn

音源分離における音響モデリング（Acoustic modeling in audio source separation）

統計的ボイチェン研究事情

音情報処理における特徴表現

調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離

音声生成の基礎と音声学

サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価

音声信号の分析と加工－音声を自在に変換するには？

Viewers also liked

Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo

Asj2017 3 bileveloptnmfSaruwatariLabUTokyo

Hybrid NMF APSIPA2014 invitedSaruwatariLabUTokyo

Ica2016 312 saruwatariSaruwatariLabUTokyo

Apsipa2016for ssSaruwatariLabUTokyo

Koyama AES Conference SFC 2016SaruwatariLabUTokyo

Discriminative SNMF EA201603SaruwatariLabUTokyo

数値解析と物理学すずしめ

独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura

Viewers also liked (9)

Koyama ASA ASJ joint meeting 2016

Asj2017 3 bileveloptnmf

Hybrid NMF APSIPA2014 invited

Ica2016 312 saruwatari

Apsipa2016for ss

Koyama AES Conference SFC 2016

Discriminative SNMF EA201603

数値解析と物理学

独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...

Similar to Statistical Model Based Speech Enhancement without Musical Noise

Une18apsipaYuki Saito

International Journal of Computational Engineering Research(IJCER)ijceronline

Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...Hiroki_Tanji

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISESsipij

F010334548IOSR Journals

A new methodology for sp noise removal in digital image processing ijfcstjournal

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based F...Juan Camilo Vasquez

A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp

A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij

Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe

Audio Noise Removal – The State of the Artijceronline

The past, present and future of singing synthesisEji Warp

Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals

A fast and effective impulse noise filterIJRES Journal

The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio EstimatorIJERA Editor

20150211 NAB paper - Audio Loudness Range -John KeanJeremy Adams

Analysis PSNR of High Density Salt and Pepper Impulse Noise Using Median Filterijtsrd

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals

Similar to Statistical Model Based Speech Enhancement without Musical Noise (20)

Une18apsipa

International Journal of Computational Engineering Research(IJCER)

Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES

F010334548

A new methodology for sp noise removal in digital image processing

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based F...

A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...

A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...

Adaptive noise estimation algorithm for speech enhancement

Audio Noise Removal – The State of the Art

The past, present and future of singing synthesis

Improvement of minimum tracking in Minimum Statistics noise estimation method

A fast and effective impulse noise filter

The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator

20150211 NAB paper - Audio Loudness Range -John Kean

Analysis PSNR of High Density Salt and Pepper Impulse Noise Using Median Filter

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...

Recently uploaded

Heart Disease Prediction using machine learning.pptxPoojaBan

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Architect Hassan Khalil Portfolio for 2024hassan khalil

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

Past, Present and Future of Generative AIabhishek36461

HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Oxy acetylene welding presentation note.eptoze12

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Software and Systems Engineering Standards: Verification and Validation of Sy...

Architect Hassan Khalil Portfolio for 2024

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

chaitra-1.pptx fake news detection using machine learning

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

Call Girls Narol 7397865700 Independent Call Girls

Past, Present and Future of Generative AI

HARMONY IN THE HUMAN BEING - Unit-II UHV-2

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Oxy acetylene welding presentation note.

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

main PPT.pptx of girls hostel security using rfid

Statistical Model Based Speech Enhancement without Musical Noise

1. Statistical-Model-Based Speech Enhancement with Musical-Noise-Free Properties Hiroshi Saruwatari (The University of Tokyo, JAPAN) IEEE DSP2015 Invited Talk

2. Outline 1. Research background 2. What is musical-noise-free? 3. Conventional statistical-model-based speech enhancement 4. Proposed method and analysis 5. Experimental evaluation 6. Conclusion 2

3. Research Background and Goal  Single-channel speech enhancement  Spectral subtraction (SS) [Boll, 1979], Wiener Filtering, Bayesian minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator [Ephraim, 1984], MAP estimator [Lotter, 2005], etc.  Harmful distortion owing to musical noise generation  Musical-noise-free speech enhancement [Miyazaki, Saruwatari et al., IEEE Trans. ASLP 2012]  Noise reduction without any musical noise  We have found that SS (maximum-likelihood amplitude estimator) has musical-noise-free state.  Whether or not Generalized Bayesian MMSE-STSA estimator has musical-noise-free state? 3

4. Relation between Musical Noise and Kurtosis 4 Proportional relation between human perception (musical noise score) and log kurtosis ratio [Saruwatari, 2008]

5. What is Musical-Noise-Free? 5

6. Musical-Noise-Free Speech Enhancement  Iterative noise reduction procedure with musical-noise- free condition [Miyazaki, Saruwatari, et al., IEEE Trans. ASLP 2012] 6 …

7. MOSIE (generalized MMSE-STSA) Estimator 7 Statistical speech amplitude estimator with parametric speech prior [Breithaupt, et al., IEEE Trans. 2011]

8. How to Generate Musical-Noise-Free State? 8 Unfortunately we cannot find any musical-noise-free states in the conventional MOSIE estimator. No intersection! Forgetting factor a is increasing

9. Analysis Strategy 9

10. Calculation of Moment for Biased MOSIE (1/4) 10 1. Derivation of p.d.f.

11. Calculation of Moment for Biased MOSIE (2/4) 11 2. Calculation of moment for

12. Calculation of Moment for Biased MOSIE (3/4) 12 3. Moment-cumulant transformation for 4. Cumulant of noise power spectrum

13. Calculation of Moment for Biased MOSIE (4/4) 13 5. Cumulant-moment transformation for m1 is used for NRR, and m2 and m4 are used for kurtosis, which are functions of value of bias e.

14. Calculation of Moment for Biased MOSIE (4/4) 14 Bias e large

15. Experiment 1: Existence of Musical-Noise-Free 15 Noise White Gaussian noise in 0-dB SNR Speech prior Gaussian model (r = 1) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Theoretical analysis Experimental results Bias e = 0 To introduce bias ε, we find musical-noise-free state in statistical-model-based estimator. e large

16. Experiment 2: Existence of Musical-Noise-Free 16 Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Theoretical analysis Experimental results Bias e = 0 Strong speech prior (small ρ) gives almost no musical- noise-free state in real processing. e large

17. Experiment 3: Comparison with Other Methods 17 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB

18. Experiment 3: Comparison with Other Methods 18 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB Large musical noise methods No musical noise methods

19. Experiment 3: Comparison with Other Methods 19 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB Lowest speech distortion Large musical noise methods No musical noise methods Richer speech prior

20. Conclusion  To introduce bias ε, we find musical-noise-free state in Bayesian estimator.  Proposed biased MOSIE estimator can achieve better cepstral distortion whereas its kurtosis ratio is perfectly fixed to 1.0.  Strong speech prior (small ρ) gives almost no musical-noise-free state. So we should carefully select the appropriate prior to maintain the qualities of both speech and remaining noise. 20 Thank you for your attention!

Statistical Model Based Speech Enhancement without Musical Noise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Statistical Model Based Speech Enhancement without Musical Noise

Similar to Statistical Model Based Speech Enhancement without Musical Noise (20)

Recently uploaded

Recently uploaded (20)

Statistical Model Based Speech Enhancement without Musical Noise