Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

985 views

Published on

Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.

Published in: Science
  • Be the first to comment

Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

  1. 1. Daichi Kitamura (SOKENDAI, Japan) Nobutaka Ono (NII/SOKENDAI, Japan) Efficient initialization for NMF based on nonnegative ICA IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2 SPC-II-04
  2. 2. • Nonnegative matrix factorization (NMF) [Lee, 1999] – Dimensionality reduction with nonnegative constraint – Unsupervised learning extracting meaningful features – Sparse decomposition (implicitly) Research background: what is NMF? Amplitude Amplitude Input data matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time Time Frequency Frequency 2/19 : # of rows : # of columns : # of bases
  3. 3. • Optimization in NMF – Define a cost function (data fidelity) and minimize it – No closed-form solution for and – Efficient iterative optimization • Multiplicative update rules (auxiliary function technique) [Lee, 2001] – Initial values for all the variables are required. Research background: how to optimize? 3/19 (when the cost function is a squared Euclidian distance)
  4. 4. • Results of all applications using NMF always depend the initialization of and . – Ex. source separation via full-supervised NMF [Smaragdis, 2007] • Motivation: Initialization method that always gives us a good performance is desired. Problem and motivation 4/19 12 10 8 6 4 2 0 SDRimprovement[dB] Rand10 Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Different random seeds More than 1dB Poor Good
  5. 5. • With random values (not focused here) – Directly use random values – Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011] – Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011] • Cluster input data into clusters, and set the centroid vectors to initial basis vectors. • Without random values – PCA-based initialization [Zhao, 2014] • Apply PCA to input data , extract orthogonal bases and coefficients, and set their absolute values to the initial bases and activations. – SVD-based initialization [Boutsidis, 2008] • Apply a special SVD (nonnegative double SVD) to input data and set nonnegative left and right singular vectors to the initial values. Conventional NMF initialization techniques 5/19
  6. 6. • Are orthogonal bases really better for NMF? – PCA and SVD are orthogonal decompositions. – A geometric interpretation of NMF [Donoho, 2003] • The optimal bases in NMF are “along the edges of a convex cone” that includes all the observed data points. – Orthogonality might not be a good initial value for NMF. Bases orthogonality? 6/19 Convex cone Data points Edge Optimal bases Orthogonal bases Tight bases satisfactory for representing all the data points have a risk to represent a meaningless area cannot represent all the data points Meaningless areas
  7. 7. • What can we do from only the input data ? – Independent component analysis (ICA) [Comon, 1994] – ICA extracts non-orthogonal bases • that maximize a statistical independence between sources. – ICA estimates sparse sources • when we assume a super-Gaussian prior. • Propose to use ICA bases and estimated sources as initial NMF values – Objectives: • 1. Deeper minimization • 2. Faster convergence • 3. Better performance Proposed method: utilization of ICA 7/19 Number of update iterations in NMF Valueofcost functioninNMF Deeper minimization Faster convergence
  8. 8. • The input data matrix is a mixture of some sources. – sources in are mixed via , then observed as – ICA can estimate a demixing matrix and the independent sources . • PCA for only the dimensionality reduction in NMF • Nonnegative ICA for taking nonnegativity into account • Nonnegativization for ensuring complete nonnegativity Proposed method: concept 8/19 Input data matrix Mixing matrix Source matrix … … Input data matrix PCA NMFInitial values NICA Nonnegativization …ICA bases PCA matrix for dimensionality reduction Mutually independent
  9. 9. • Nonnegative ICA (NICA) [Plumbley, 2003] – estimates demixing matrix so that all of the separated sources become nonnegative. – finds rotation matrix for pre-whitened mixtures . – Steepest gradient descent for estimating Nonnegative constrained ICA 9/19 Cost function: where Observed Whitening w/o centering Pre-whitened Separated Rotation (demixing)
  10. 10. • Dimensionality reduction via PCA • NMF variables obtained from the estimates of NICA – Support that , – then we have Combine PCA for dimensionality reduction 10/19 Rows are eigenvectors of has top- eigenvectors Eigenvalues High Low Basis matrix Activation matrix Rotation matrix estimated by NICA ICA bases Sources Zero matrix
  11. 11. • Even if we use NICA, there is no guarantee that – obtained (sources) becomes completely nonnegative because of the dimensionality reduction by PCA. – As for the obtained basis (ICA bases), nonnegativity is not assumed in NICA. • Take a “nonnegativization” for obtained and : – Method 1: – Method 2: – Method 3: • where and are scale fitting coefficient that depend on a divergence of following NMF Nonnegativization 11/19 Correlation between and Correlation between and
  12. 12. • Power spectrogram of mixture with Vo. and Gt. – Song: “Actions – One Minute Smile” from SiSEC2015 – Size of power spectrogram: 2049 x 1290 (60 sec.) – Number of bases: Experiment: conditions 12/19 Frequency[kHz] Time [s]
  13. 13. • Convergence of cost function in NICA Experiment: results of NICA 13/19 0.6 0.5 0.4 0.3 0.2 0.1 0.0 ValueofcostfunctioninNICA 2000150010005000 Number of iterations Steepest gradient descent
  14. 14. • Convergence of EU-NMF Experiment: results of Euclidian NMF 14/19 Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s EU-NMF: 12.78 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
  15. 15. • Convergence of KL-NMF Experiment: results of Kullback-Leibler NMF 15/19 PCA Proposed methods Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s KL-NMF: 48.07 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
  16. 16. • Convergence of IS-NMF Experiment: results of Itakura-Saito NMF 16/19 PCA Proposed methods x106 Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s IS-NMF: 214.26 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
  17. 17. Experiment: full-supervised source separation • Full-supervised NMF [Smaragdis, 2007] – Simply use pre-trained sourcewise bases for separation 17/19 Training stage , Separation stage Initialized by conventional or proposed method Cost functions: Cost function: Pre-trained bases (fixed) Initialized based on the correlations between and or
  18. 18. • Two sources separation using full-supervised NMF – SiSEC2015 MUS dataset (professionally recorded music) – Averaged SDR improvements of 15 songs Experiment: results of separation 18/19 Separation performance for source 1 Separation performance for source 2 Rand10 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 12 10 8 6 4 2 0 SDRimprovement[dB] 5 4 3 2 1 0 SDRimprovement[dB] Rand10 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Prop. Conv. Prop. Conv.
  19. 19. Conclusion • Proposed efficient initialization method for NMF • Utilize statistical independence for obtaining non- orthogonal bases and sources – The orthogonality may not be preferable for NMF. • The proposed initialization gives – deeper minimization – faster convergence – better performance for full-supervised source separation 19/19 Thank you for your attention!

×