Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

860 views

Published on

Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.

Published in: Engineering
  • Be the first to comment

Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

  1. 1. Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) IEEE International Symposium on Signal Processing and Information Technology December 12-15, 2013 - Athens, Greece Session T.B3: Speech – Audio - Music
  2. 2. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 2
  3. 3. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 3
  4. 4. • Sound signal separation – decomposes target source from an observed mixed signal. – Speech and noise, specific instrumental sound, etc. • Typical method for sound signal separation – is treated in the time-frequency domain. Background Extract! Time Frequency Spectrogram First tone Second tone Separation 4
  5. 5. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 5
  6. 6. • Nonnegative matrix factorization (NMF) – is a sparse representation algorithm. – can extract significant features from the observed matrix. • It is difficult to cluster the bases as specific sources. Nonnegative matrix factorization [Lee, et al., 2012] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 6 Basis
  7. 7. • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound. – Decompose into the target signal and other signal. Supervised NMF (SNMF) [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 7Fixed Ex. Musical scale Target signal Other signalMixed signal
  8. 8. Problem of SNMF • Basis sharing problem in SNMF – There is no constraint between and . – Other bases may also have the target spectral patterns. – The estimated target signal loses some of the target signal. – The cost function is only defined as the distance between 8 Estimated target signal Estimated other signals Target signal If also have the target basis… and .
  9. 9. Basis sharing problem: example of SNMF 9 Separated by SNMF Mixed signal Only the target signal (oracle)
  10. 10. Basis sharing problem: example of SNMF 10 Only the target signal (oracle) Separated by SNMF Mixed signal
  11. 11. Basis sharing problem: example of SNMF 11 Separated by SNMF Separated signal (estimated) The estimated signal loses some of the target components because of the basis sharing problem.
  12. 12. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 12
  13. 13. Proposed method • In SNMF, other basis matrix may have the same spectral patterns with supervised basis matrix . • Propose to make as different as possible from by introducing a penalty term in the cost function. 13 Target signal Other signalMixed signal Fixed Optimize as different as possible from . Basis sharing problem Penalized SNMF (PSNMF)
  14. 14. Decomposition model and cost function 14 Decomposition model: Cost function in SNMF: Generalized divergence function: -divergence [Eguchi, et al., 2001] Supervised basis matrix (fixed)
  15. 15. Decomposition model and cost function 15 Introduce a penalty term We propose two types of penalty terms. Cost function in PSNMF: Decomposition model: Cost function in SNMF: Supervised basis matrix (fixed)
  16. 16. Orthogonality penalty • Orthogonality penalty is the optimization of that minimizes the inner product of matrices and . – If includes the similar basis to , becomes larger. • All the bases are normalized as one. • Introduce a weighting parameter . 16
  17. 17. Maximum-divergence penalty • Maximum-divergence penalty is the optimization of – If includes the similar basis to , the divergence becomes smaller. • All the bases are normalized as one. • Introduce a weighting parameter and sensitivity parameter . 17 that maximizes the divergence between and .
  18. 18. Derivation of optimal variables in PSNMF • Derive the optimal variables . • Auxiliary function method – Optimization scheme that uses the upper bound function. – Design the auxiliary function for and as and . – Minimize the original cost functions by minimizing the auxiliary functions indirectly. 18
  19. 19. Derivation of optimal variables in PSNMF • The second and third terms become convex or concave function w.r.t. value. – Convex: Jensen’s inequality – Concave: tangent line inequality 19 where
  20. 20. Derivation of optimal variables in PSNMF • Always becomes the convex function – Convex: Jensen’s inequality 20 : auxiliary variable
  21. 21. Derivation of optimal variables in PSNMF • Auxiliary functions and are designed as • The update rules for optimization are obtained by 21 , and .
  22. 22. Update rules for optimization of PSNMF • Update rules with orthogonality penalty 22 where,
  23. 23. Update rules for optimization of PSNMF • Update rules with maximum-divergence penalty 23 where,
  24. 24. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 24
  25. 25. • Produced four melodies using a MIDI synthesizer. • Used the same MIDI sounds of the target instruments containing two octave notes as a supervision sound. • Evaluation in two-source case and four-source case. – There are 12 combinations in the two-source case, and 4 patterns in the four-source case. Experimental conditions 25 Training sound Two octave notes that cover all the notes of the target signal.
  26. 26. • Evaluation scores [Vincent, 2006] – Source-to-distortion ratio (SDR) – SDR indicates the total quality of separated signal. Experimental conditions Observed signal Mixed 2 or 4 signals as the same power Training signal The same MIDI sounds of the target signal containing two octave notes Divergence criteria All combinations of Number of bases Supervised bases : 100 Other bases : 50 Parameters Experimentally determined Methods Conventional SNMF, Proposed PSNMF 26
  27. 27. 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] • Average scores of 12 combinations – Conventional SNMF cannot achieve high separation accuracy because of the basis sharing problem. – Proposed method outperforms conventional SNMF. Experimental results: two-source-case 27 Conv. SNMF PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF
  28. 28. • Average scores of 4 combinations – PSNMF outperforms the conventional method. 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] Experimental results: four-source-case 28 PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF Conv. SNMF
  29. 29. Example of separation (Cello & Oboe) 29 Separated by SNMF Cello signal Mixed signal Separated by PSNMF (Ortho.)
  30. 30. Conclusions • Conventional supervised NMF has a basis sharing problem that degrades the separation performance. • We propose to add a penalty term, which forces the other bases to become uncorrelated with supervised bases, in the cost function. • Penalized supervised NMF can achieve the high separation accuracy. 30 Penalized supervised NMF Thank you for your attention!

×