Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Blind source separation based on independent low-rank matrix analysis and its extensions

296 views

Published on

Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.

Published in: Science
  • Be the first to comment

Blind source separation based on independent low-rank matrix analysis and its extensions

  1. 1. Blind source separation based on independent low-rank matrix analysis and its extensions Ohio State University Visiting December 15th, 2017 The University of Tokyo, Japan Project Research Associate Daichi Kitamura
  2. 2. • Name: Daichi Kitamura • Age: 27 (born in 1990) – Born in Kagawa in Japan • Background: – NAIST, Japan • Master degree (received in 2014) – SOKENDAI, Japan • Ph.D. degree (received in 2017) – The University of Tokyo, Japan • Project Research Associate • Research topics – Acoustic signal processing, statistical signal processing, audio source separation, etc. Self introduction 2 Japan Kagawa (place of birth) Tokyo (Univ. Tokyo)
  3. 3. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 3
  4. 4. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 4
  5. 5. • Blind source separation (BSS) for audio signals – separates original audio sources – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. – can be available for many audio app. • Consider only “determined” situation Background 5 Recording mixture Separated guitar BSS Sources Observed Estimated Mixing system Demixing system # of mics # of sources
  6. 6. • Basic theories and their evolution History of BSS for audio signals 6 1994 1998 2013 1999 2012 Year Many permutation solvers for FDICA Apply NMF to many tasks Generative models in NMF Many extensions of NMF Independent component analysis (ICA) Frequency-domain ICA (FDICA) Itakura–Saito NMF (ISNMF) Independent vector analysis (IVA) Multichannel NMF Independent low-rank matrix analysis (ILRMA) *Depicting only popular methods 2016 2009 2006 2011 Auxiliary-function-based IVA (AuxIVA) Time-varying Gaussian IVA Nonnegative matrix factorization (NMF)
  7. 7. Motivation of ILRMA • Conventional BSS techniques based on ICA –  Minimum distortion (linear demixing) –  Relatively fast and stable optimization • FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary function technique [N. Ono+, 2010], [N. Ono, 2011] –  Could not use “specific” assumption of sources • Only assumes non-Gaussian p.d.f. for sources –  Permutation problem is crucial and still difficult to solve • IVA often fails causing a “block permutation problem” [Y. Liang+, 2012] • Better to use a “specific source model” in TF domain – Independent low-rank matrix analysis (ILRMA) employs a low-rank property 7 : frequency bins Observed signal Source signalsFrequency-wise mixing matrix : time frames Estimated signal Frequency-wise demixing matrix
  8. 8. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 8
  9. 9. • Independent component analysis (ICA)[P. Comon, 1994] – estimates without knowing – Source model (scalar) • is non-Gaussian and mutually independent – Spatial model • Mixing system is a time-invariant matrix • Mixing system in audio signals – Convolutive mixture with room reverberation Related methods: ICA 9 Mixing matrix Demixing matrix Source model Sources Observed Estimated Spatial model
  10. 10. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] – estimates frequency-wise demixing matrix – Source model (scalar) • is complex-valued, non-Gaussian, and mutually independent – Spatial model • Frequency-wise mixing matrix is time-invariant – Instantaneous mixture in each frequency band – A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010] • Permutation problem? – Order of estimated signals cannot be determined by ICA – Alignment of frequency-wise estimated signals is required • Many permutation solvers were proposed Related methods: FDICA 10 Spectrograms ICA1 … Frequencybin Time frame … ICA2 ICA I
  11. 11. • FDICA requires signal alignment for all frequency – Order of estimated signals cannot be determined by ICA* Permutation problem 11 ICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Permutation Solver Estimated signal 1 Estimated signal 2 Time *Signal scale also must be restored by applying a back-projection technique
  12. 12. Related methods: IVA • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] – extends ICA to multivariate probabilistic model to consider sourcewise frequency vector as a vector variable – Source model (vector) • is multivariate, spherical, complex-valued, non-Gaussian, and mutually independent – Spatial model • Mixing system is a time-invariant matrix (rank-1 spatial model) 12 … … Mixing matrix … … … Observed vector Demixing matrix Estimated vector Multivariate non- Gaussian dist. Have higher-order correlations Permutation-free estimation of is achieved! Source vector
  13. 13. • Spherical multivariate distribution[T. Kim+, 2007] • Why spherical distribution? – Frequency bands that have similar activations will be merged together as one source avoid permutation problem Higher-order correlation assumed in IVA 13 x1 and x2 are mutually independent Spherical Laplace dist. Mutually independent two Laplace dist.s x1 and x2 have higher-order correlation Probability depends on only the norm
  14. 14. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] Comparison of source models 14 Observed Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Estimated Demixing matrix Current empirical dist. Non-Gaussian source dist. STFT Frequency Time Frequency Time Observed Estimated Current empirical dist. STFT Frequency Time Frequency Time Non-Gaussian spherical source dist. Scalar r.v.s Vector (multivariate) r.v.s Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Mixture is close to Gaussian signal because of CLT Source obeys non- Gaussian dist. Mutually independent Demixing matrix Mutually independent
  15. 15. Related method: NMF • Nonnegative matrix factorization (NMF) [D. D. Lee, 1999] – Low-rank decomposition with nonnegative constraint • Limited number of nonnegative bases and their coefficients – Spectrogram is decomposed in acoustic signal processing • Frequently appearing spectral patterns and their activations 15 Amplitude Amplitude Nonnegative matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time : # of freq. bins : # of time frames : # of bases Time Frequency Frequency
  16. 16. • ISNMF[C. Févotte, 2009] – can be decomposed using “stable property” of • If we define , Related method: ISNMF 16 Equivalent Circularly symmetric complex Gaussian dist. Complex-valued observed signal Nonnegative variance Variance is also decomposed!
  17. 17. • Power spectrogram corresponds to variances in TF plane Related method: ISNMF 17 Frequencybin Time frame : Power spectrogram Small value of power Large value of power Complex Gaussian distribution with TF-varying variance If we marginalize in terms of time or frequency, the distribution becomes non-Gaussian even though each TF grid is defined in Gaussian distribution Grayscale shows the value of variance
  18. 18. Comparison of low-rankness 18 Drums Guitar Vocals Speech
  19. 19. • Low-rankness (simplicity of a matrix) – can be measured by a cumulative singular value (CSV) – Drums and guitar are quite low-rank • Also, vocals and speech are to some extent low-rank – Music spectrogram can be modeled by only few patterns Comparison of low-rankness 19 95% line 7 29 Around 90 Number of bases when CSV reaches 95% (Spectrogram size is 1025 x1883)
  20. 20. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 20
  21. 21. Extension of source model in IVA • Source model in IVA – has a frequency-uniform scale • Spherical multivariate Laplace • Higher-order correlation among frequency – Equivalent to NMF with one flat basis • Source model in ISNMF[C. Févotte+, 2009] – NMF with arbitrary number of bases • can represent complicated TF structures – can learn “co-occurrence” structure in TF domain for each source • Low-rank co-occurrence is captured as the variance – The source-wise structure can be estimated by ISNMF 21 Frequency Time Frequency Time Replace the source model assumed in ICA or IVA
  22. 22. • Source model in IVA • Source model in ISNMF[C. Févotte+, 2009] 22 Frequency-uniform scale Extension of source model in IVA Zero-mean complex Gaussian in each TF bin Low-rank decomposition with NMF Spherical Laplace dist. (bivariate case) Frequency vector (I-dimension) Time-frequency-varying variance Time-frequency matrix (IJ-dimensional) Replace the source model assumed in ICA or IVA
  23. 23. • Negative log-likelihood in ILRMA Cost function in ILRMA and partitioning function 23 All the variables can easily be optimized by an alternative update Update rules in ICA Update rules in ISNMF Estimated signal: Cost function in ICA (estimates demixing matrix) Cost function in ISNMF (estimates low-rank source model) Replaced from IVA model to ISNMF model
  24. 24. Update rules of ILRMA • ML-based iterative update rules – Update rule for is based on iterative projection [N. Ono, 2011] – Update rules for NMF variables is based on MM algorithm – Pseudo code is available at • http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 24 Spatial model (demixing matrix) Source model (NMF source model) where and is a one-hot vector that has 1 at th element
  25. 25. Optimization process in ILRMA • Demixing matrix and source model are alternatively updated – The precise modeling of low-rank TF structures will improve the estimation accuracy of demixing matrix 25 Estimating demixing matrix Mixture Separated Source model Update NMF NMF Estimating NMF variables
  26. 26. Comparison of source models 26 FDICA source model Non-Gaussian scalar variable IVA source model Non-Gaussian vector variable with higher-order correlation ILRMA source model Non-Gaussian matrix variable with low-rank time-frequency structure Rank of TF matrix of mixture Rank of TF matrix of each source
  27. 27. • Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013] Multichannel extension of NMF 27 Spatial covariances in each time-frequency slot Observed multichannel signal Spatial covariances of each source Basis matrix Activation matrix Spatial model Source model Partitioning function Spectral patterns Gains Spatial property of each source Timber patterns of all sources Multichannel vector Instantaneous spatial covariance
  28. 28. Relationship b/w ILRMA and multichannel NMF • Difference b/w ILRMA and multichannel NMF? – Source distribution: complex Gaussian distribution (same) – ILRMA assumes – Multichannel NMF assumes full-rank spatial covariance • Assumption: rank-1 spatial model – Spatial covariance of each source is rank-1 matrix – Equivalent to simultaneous mixing assumption 28 Sourcewise steering vector ,
  29. 29. Relationship b/w ILRMA and multichannel NMF • Multichannel NMF with rank-1 spatial model 30 Substitute into the cost function Transform the variables as
  30. 30. Relationship b/w MNMF, IVA, and ILRMA • From multichannel NMF side, – Rank-1 spatial model is introduced, transform the problem from the estimation of mixing system to that of demixing matrix • From IVA side, – Increase the number of spectral bases in source model 31 Source model Spatialmodel FlexibleLimited FlexibleLimited IVA Multichannel NMF ILRMA NMF source model Rank-1 spatial model
  31. 31. Experimental evaluation • Conditions 32 Source signals Music signals obtained from SiSEC Convolve impulse response, two microphones and two sources Window length 512 ms of Hamming window Shift length 128 ms (1/4 shift) Number of bases 30 per each source Evaluation score Improvement ot signal-to-distortion ratio (SDR) 2 m Source 1 5.66 cm 50 50 Source 2 Impulse response E2A (reverberation time: 300 ms)
  32. 32. • Ultimate NZ tour (Guitar and Synthesizer, 14s) Result example 33 Poor Good 20 15 10 5 0 SDRimprovement[dB] Guitar Synth. IVA Multichannel NMF ILRMA
  33. 33. • Ultimate NZ tour (Guitar and Synthesizer, 14s) 12 10 8 6 4 2 0 -2 SDRimprovement[dB] 4003002001000 Iteration steps IVA MNMF ILRMA ILRMA Results: bearlin-roads 34 without Z with Z 11.5 s 15.1 s 60.7 s 7647.3 s Poor Good
  34. 34. • Thurston’s pairwise comparison – Speech separation and music separation tasks – 10 males and 4 females Subjective evaluation 35 1.6 1.2 0.8 0.4 0.0 -0.4 -0.8 -1.2 Subjectivescore IVA Multichannel NMF ILRMA Speech signals Music signals
  35. 35. Demonstration: music source separation • Music source separation 36 Guitar Vocal Keyboard Guitar Vocal Keyboard Source separation Pay attention to listen three parts in the mixture Another demo is available at http://d-kitamura.net/en/index_en.html
  36. 36. Best optimization balance? • “Alternating update” of spatial model (ICA) and source model (NMF) is used in ILRMA – Sometimes the optimization in ILRMA is trapped into a poor solution (local minimum) • There may be exists the best optimization balance b/w ICA and NMF models to avoid local minima 37 ICA (demixing matrix) NMF (low-rank source model) Identity and Randomized NMF update ICA update
  37. 37. Controlling optimization speed • How to control the optimization speed ensuring the convergence of algorithm? – Parametric majorization-equalization (ME) algorithm – Apply parametric ME to NMF optimization to find the best balance between ICA and NMF • Find the best balance of optimization speeds between NMF and ICA 38 Identity and Randomized NMF update ICA update Becomes controllable by parametric ME
  38. 38. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 39
  39. 39. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 40
  40. 40. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] 41
  41. 41. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] 42 Fast Slow
  42. 42. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Parametric ME algorithm [Y. Mitsui+, 2017] 43
  43. 43. Parametric-ME-based NMF optimization • Comparison of NMF update rules – Update rules of basis matrix – Only the exponent is different – Optimization speed of NMF model can be controlled by 44 MM algorithm ME algorithm Parametric ME algorithm
  44. 44. Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds 45 ultimate_nz_tour FastSlow
  45. 45. Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds 46 another_dreamer-the_ones_we_love FastSlow
  46. 46. Parametric-ME-based ILRMA • Slower NMF optimization (small value of ) tends to provide better results in ILRMA – But, why? We don’t know! • Conjecture – In the beginning of ILRMA, NMF model is “random” • Not believable – The demixing matrix can be updated without source model to some extent (because even IVA works well) • Statistical independence between sources is very powerful 47 Independence- based separation Initialization Precise modeling of source structure Improved separation Updated Updated Updated Slowly updated Slowly updated Updated
  47. 47. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 48
  48. 48. Conclusion • Independent low-rank matrix analysis (ILRMA) – Permutation-free ICA-based blind source separation – Assumption • Statistical independence between sources • Low-rank time-frequency structure of each source – Equivalent to multichannel NMF • when the mixing assumption is valid • On going works! – Relaxation of rank-1 spatial model – Extension of source generative model – Semi/full-supervised ILRMA, user-guided ILRMA – and, collaboration of deep neural network… • Independent deeply learned matrix analysis (IDLMA) • Maybe submitted at next EUSIPCO…? 49
  49. 49. Conclusion • Independent low-rank matrix analysis (ILRMA) – will be published from Springer in March, 2018! 50 Audio Source Separation (Signals and Communication Technology) 1st ed. 2018 Edition by Shoji Makino (Editor) Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, and Hiroshi Saruwatari, "Determined blind source separation with independent low- rank matrix analysis“ Search in Amazon.com!
  50. 50. Conclusion • Independent low-rank matrix analysis (ILRMA) – will be presented in ICASSP 2018 as a tutorial session! • Title (tentative): Blind Audio Source Separation on Tensor Representation – Presenters: Hiroshi Sawada, Nobutaka Ono, Hirokazu Kameoka, Daichi Kitamura 51 Thank you so much for your attention!

×