Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dereverberation in the stft and log mel frequency feature domains

2,097 views

Published on

Slides explaining a few algorithms for dereverberation, removing the effect of reverberation from far-field recordings.

Published in: Technology
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Dereverberation in the stft and log mel frequency feature domains

  1. 1. 1 April 2012 Dereverberation in the STFT and log mel-frequency feature domains Takuya Yoshioka
  2. 2. Dereverberation is necessary for many speech applications“ ”
  3. 3. 0 10 20 30 0.2 0.3 0.4 0.5 0.6 ASR (connected digit recognition) T60 in seconds Worderrorratein%
  4. 4. ASR (LVCSR using WSJ-20K) 0 20 40 60 80 100 Clean training +MLLR Multi-style training Worderrorratein%
  5. 5. Source separation T60=0.3 s T60=0.5 s 0 2 4 6 8 10 12SNRindB
  6. 6. And others… • Source localization • Adaptive beamforming • VAD
  7. 7. Dereverberation is necessary for many speech applications“ ”
  8. 8. Acoustic feature extraction process STFT | ・ |2 Mel FB Log compression DCT Δ, ΔΔ Microphone Decoder
  9. 9. Acoustic feature extraction process STFT | ・ |2 Mel FB Log compression DCT Δ, ΔΔ Microphone Decoder STFT coefficients Fully benefit from the use of microphone arrays
  10. 10. Acoustic feature extraction process STFT | ・ |2 Mel FB Log compression DCT Δ, ΔΔ Microphone Decoder Power spectra Easy to combine with noise suppressors
  11. 11. Acoustic feature extraction process STFT | ・ |2 Mel FB Log compression DCT Δ, ΔΔ Microphone Decoder Log mel-frequency features Efficient for reducing the acoustic mismatch between observations and training data
  12. 12. n : frame index ny : corrupted vector nx : clean vector nxˆ : estimate of xn Notations
  13. 13. Optimal estimation in the MMSE sense ∫= nn xxˆ ),,|(p 1nnYY,|X past yyx  ndx
  14. 14. ∫= nn xxˆ ),,|(p 1nnYY,|X past yyx  ndx ),,,|(p 11-nnnYX,|Y past yyxy  )(p nX x × Clean speech modelReverberation model Generative approach (using Bayes rule)
  15. 15. ∫= nn xxˆ ),,|(p 1nnYY,|X past yyx  ndx ),,,|(p 11-nnnYX,|Y past yyxy  )(p nX x × Clean speech modelReverberation model Generative approach (using Bayes rule)
  16. 16. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain Linear prediction VTS
  17. 17. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  18. 18. n : frame index ny : corrupted complex-valued spectrum (consisting of 257 bins) nx : clean complex-valued spectrum nxˆ : estimate of xn Notations
  19. 19. ∏= j X jn,jn,CNnX,nX )λ;0,(xf)Λ;(p x Clean STFT coefficients: normally distributed X Jn, X n,1 λ,...,λ X nP1,...,p X pn, σ,)(a = 2 p piωX pn, X nX jn, j ea1 σ λ ∑ − − = All-pole model No model Model Form Parameters Clean PSD
  20. 20. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  21. 21. 1-source 1-microphone case: multi-step LP ∑≥ − ∗ += Δp jp,njp,jn,jn, ygxy 1,2,...njn, )(y = 1,2,...njn, )(x =
  22. 22. 1-source 1-microphone case: multi-step LP ∑≥ − ∗ += Δp jp,njp,jn,jn, ygxy + 1,2,...njn, )(y = 1,2,...njn, )(x =
  23. 23. )xygδ(y )Λ;y,,y,x|(yp jn,jn,p jp,jn, Rj1,j1,-njn,jn,YX,|Y past −−= ∑ ∗ 
  24. 24. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  25. 25. When model parameters are known jn,p jp,jn,jn, ygyx ∑ ∗ −= ˆˆ )ygyδ(x jn,p jp,jn,jn, ∑ ∗ +−= ˆ )Λ,Λ;y,y|(xp RXj1,jn,jn,YY,|X past ˆˆ Inverse filtering
  26. 26. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  27. 27. ML for parameter estimation ∑∑= j n RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past 
  28. 28. ML for parameter estimation ∑∑= j n RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past  ∫ × )xygδ(y )Λ;y,,y,x|(yp jn,jn,p jp,jn, Rj1,j1,-njn,jn,YX,|Y past −−= ∑ ∗  ∏= j X jn,jn,CN nX,nX )λ;0,(xf )Λ;(p x
  29. 29. ML for parameter estimation ∑∑= j n RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past  ∑∑ ∑ − ∗ − −−= j n X jn, 2 p jp,njp,jn,X jn, λ |ygy| )log(λ
  30. 30. ML for parameter estimation ∑∑= j n RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past  ∑∑ ∑ − ∗ − −−= j n X jn, 2 p jp,njp,jn,X jn, λ |ygy| )log(λ ∑ ∑ − ∗ − = n X jn, 2 p jp,njp,jn, Λ jR, λ |ygy| argminΛ jR, ˆ ˆ If is knownX jn,λˆ
  31. 31. Iterative optimization Initializing ΛR Inverse filtering Updating ΛR Convergent? Updating ΛR RΛˆ RΛˆ XΛˆ
  32. 32. Why LP model for reverberation? Chain rule is applicable to derive the likelihood function
  33. 33. Drawback Non-minimum phase terms cannot be accurately modeled “ ”Solution: using extra microphones
  34. 34. Extensions • Integration with source separation • Integration with additive noise reduction • Adaptive inverse filtering – Using an RLS-like algorithm • Application to music signals – Using a clean source model accounting for strong harmonic structures • Exploiting prior knowledge on room properties
  35. 35. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  36. 36. n : frame index ny : corrupted log mel-frequency feature (consisting of 24 coefficients) nx : clean log mel-frequency feature nxˆ : estimate of xn Notations
  37. 37. ∑= k X k X knNkXnX ),;(fπ)Λ;(p Σμxx Clean features: pre-trained GMM )Λk;|(p XnK|X xDenoted by
  38. 38. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  39. 39. Reverberation model Early reflections Late reverberation Direct sound
  40. 40. Reverberation model Early reflections Late reverberation H=nY +⋅ nX nR Direct sound
  41. 41. Reverberation model Early reflections Late reverberation * Clean speech RIR > 50ms H=nY +⋅ nX nR Direct sound
  42. 42. Reverberation model Early reflections Late reverberation ),,( ))--exp(log(1 nn nnnn hrxg hxrhxy = +++= )),,(δ()Λ;,|(p nnnRnnnRX,|Y hrxgyrxy −= Direct sound
  43. 43. Reverberation model )),,(δ()Λ;,|(p nnnRnnnRX,|Y hrxgyrxy −= );( RR -nnNR11-nnY|R ,f)Λ;,,|(p past Σβyryyr += ∆ ∫×
  44. 44. Reverberation model ),;(f)Λk;,,,,|(p X|Y kn, X|Y kn,nNR11-nnnK,YX,|Y past Σμyyyxy ≈ ),,( ))(,,( R Δn X k X kn R Δn X k X|Y kn, hβyμg μxhβyμGμ ++ −+= − − R2R Δn X k X|Y kn, )),,(( ΣhβyμGIΣ +−= −
  45. 45. STFT domain Clean speech model Reverberation model Posterior distribution Parameter estimation Clean speech model Reverberation model Posterior distribution Parameter estimation Log mel-frequency feature domain
  46. 46. pastY|R p K|X pg KR,|Y pK,YX,|Y past p pastY|Y p K,Y|Y past p pastYY,|K p K,YY,|X past p K,YY,|R past p pastYY,|X p kπ Relationship among pdfs
  47. 47. Connected digit recognition • 1024-component GMM for VTS • Clean complex back-end defined in Aurora2 • Evaluation data set consisting of 4004 reverberant utterances – Simulated data – Impulse responses measured in a varechoic room – Speaker-microphone distance = 3.5 m – T60 = 0.2~0.6 sec
  48. 48. 0 5 10 15 20 25 30 35 0.2 0.3 0.4 0.5 0.6 Unprocessed Dereverberated Dereverberated (lower bound) Worderrorratein% T60 in seconds
  49. 49. Concluding remarks • Dereverberation can be performed in different domains • Reverberation model must accounts for the strong statistical dependencies between consecutive observation frames

×