## Just for you: FREE 60-day trial to the world’s largest digital library.

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.Free with a 14 day trial from Scribd

- 1. Daichi Kitamura (SOKENDAI, Japan) Nobutaka Ono (NII/SOKENDAI, Japan) Efficient initialization for NMF based on nonnegative ICA IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2 SPC-II-04
- 2. • Nonnegative matrix factorization (NMF) [Lee, 1999] – Dimensionality reduction with nonnegative constraint – Unsupervised learning extracting meaningful features – Sparse decomposition (implicitly) Research background: what is NMF? Amplitude Amplitude Input data matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time Time Frequency Frequency 2/19 : # of rows : # of columns : # of bases
- 3. • Optimization in NMF – Define a cost function (data fidelity) and minimize it – No closed-form solution for and – Efficient iterative optimization • Multiplicative update rules (auxiliary function technique) [Lee, 2001] – Initial values for all the variables are required. Research background: how to optimize? 3/19 (when the cost function is a squared Euclidian distance)
- 4. • Results of all applications using NMF always depend the initialization of and . – Ex. source separation via full-supervised NMF [Smaragdis, 2007] • Motivation: Initialization method that always gives us a good performance is desired. Problem and motivation 4/19 12 10 8 6 4 2 0 SDRimprovement[dB] Rand10 Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Different random seeds More than 1dB Poor Good
- 5. • With random values (not focused here) – Directly use random values – Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011] – Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011] • Cluster input data into clusters, and set the centroid vectors to initial basis vectors. • Without random values – PCA-based initialization [Zhao, 2014] • Apply PCA to input data , extract orthogonal bases and coefficients, and set their absolute values to the initial bases and activations. – SVD-based initialization [Boutsidis, 2008] • Apply a special SVD (nonnegative double SVD) to input data and set nonnegative left and right singular vectors to the initial values. Conventional NMF initialization techniques 5/19
- 6. • Are orthogonal bases really better for NMF? – PCA and SVD are orthogonal decompositions. – A geometric interpretation of NMF [Donoho, 2003] • The optimal bases in NMF are “along the edges of a convex cone” that includes all the observed data points. – Orthogonality might not be a good initial value for NMF. Bases orthogonality? 6/19 Convex cone Data points Edge Optimal bases Orthogonal bases Tight bases satisfactory for representing all the data points have a risk to represent a meaningless area cannot represent all the data points Meaningless areas
- 7. • What can we do from only the input data ? – Independent component analysis (ICA) [Comon, 1994] – ICA extracts non-orthogonal bases • that maximize a statistical independence between sources. – ICA estimates sparse sources • when we assume a super-Gaussian prior. • Propose to use ICA bases and estimated sources as initial NMF values – Objectives: • 1. Deeper minimization • 2. Faster convergence • 3. Better performance Proposed method: utilization of ICA 7/19 Number of update iterations in NMF Valueofcost functioninNMF Deeper minimization Faster convergence
- 8. • The input data matrix is a mixture of some sources. – sources in are mixed via , then observed as – ICA can estimate a demixing matrix and the independent sources . • PCA for only the dimensionality reduction in NMF • Nonnegative ICA for taking nonnegativity into account • Nonnegativization for ensuring complete nonnegativity Proposed method: concept 8/19 Input data matrix Mixing matrix Source matrix … … Input data matrix PCA NMFInitial values NICA Nonnegativization …ICA bases PCA matrix for dimensionality reduction Mutually independent
- 9. • Nonnegative ICA (NICA) [Plumbley, 2003] – estimates demixing matrix so that all of the separated sources become nonnegative. – finds rotation matrix for pre-whitened mixtures . – Steepest gradient descent for estimating Nonnegative constrained ICA 9/19 Cost function: where Observed Whitening w/o centering Pre-whitened Separated Rotation (demixing)
- 10. • Dimensionality reduction via PCA • NMF variables obtained from the estimates of NICA – Support that , – then we have Combine PCA for dimensionality reduction 10/19 Rows are eigenvectors of has top- eigenvectors Eigenvalues High Low Basis matrix Activation matrix Rotation matrix estimated by NICA ICA bases Sources Zero matrix
- 11. • Even if we use NICA, there is no guarantee that – obtained (sources) becomes completely nonnegative because of the dimensionality reduction by PCA. – As for the obtained basis (ICA bases), nonnegativity is not assumed in NICA. • Take a “nonnegativization” for obtained and : – Method 1: – Method 2: – Method 3: • where and are scale fitting coefficient that depend on a divergence of following NMF Nonnegativization 11/19 Correlation between and Correlation between and
- 12. • Power spectrogram of mixture with Vo. and Gt. – Song: “Actions – One Minute Smile” from SiSEC2015 – Size of power spectrogram: 2049 x 1290 (60 sec.) – Number of bases: Experiment: conditions 12/19 Frequency[kHz] Time [s]
- 13. • Convergence of cost function in NICA Experiment: results of NICA 13/19 0.6 0.5 0.4 0.3 0.2 0.1 0.0 ValueofcostfunctioninNICA 2000150010005000 Number of iterations Steepest gradient descent
- 14. • Convergence of EU-NMF Experiment: results of Euclidian NMF 14/19 Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s EU-NMF: 12.78 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
- 15. • Convergence of KL-NMF Experiment: results of Kullback-Leibler NMF 15/19 PCA Proposed methods Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s KL-NMF: 48.07 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
- 16. • Convergence of IS-NMF Experiment: results of Itakura-Saito NMF 16/19 PCA Proposed methods x106 Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s IS-NMF: 214.26 s (for 1000 iter.) Rand1~10 are based on random initialization with different seeds.
- 17. Experiment: full-supervised source separation • Full-supervised NMF [Smaragdis, 2007] – Simply use pre-trained sourcewise bases for separation 17/19 Training stage , Separation stage Initialized by conventional or proposed method Cost functions: Cost function: Pre-trained bases (fixed) Initialized based on the correlations between and or
- 18. • Two sources separation using full-supervised NMF – SiSEC2015 MUS dataset (professionally recorded music) – Averaged SDR improvements of 15 songs Experiment: results of separation 18/19 Separation performance for source 1 Separation performance for source 2 Rand10 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 12 10 8 6 4 2 0 SDRimprovement[dB] 5 4 3 2 1 0 SDRimprovement[dB] Rand10 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Prop. Conv. Prop. Conv.
- 19. Conclusion • Proposed efficient initialization method for NMF • Utilize statistical independence for obtaining non- orthogonal bases and sources – The orthogonality may not be preferable for NMF. • The proposed initialization gives – deeper minimization – faster convergence – better performance for full-supervised source separation 19/19 Thank you for your attention!