Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration

601 views

Published on

Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration

  1. 1. Hybrid Multichannel Signal Separation Using Supervised Nonnegative Matrix Factorization Daichi Kitamura, (The Graduate University for Advanced Studies, Japan) Hiroshi Saruwatari, (The University of Tokyo, Japan) Satoshi Nakamura, (Nara Institute of Science and Technology, Japan) Yu Takahashi, (Yamaha Corporation, Japan) Kazunobu Kondo, (Yamaha Corporation, Japan) Hirokazu Kameoka, (The University of Tokyo, Japan) Asia-Pacific Signal and Information Processing Association ASC 2014 Special session – Recent Advances in Audio and Acoustic Signal processing
  2. 2. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 2
  3. 3. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 3
  4. 4. Research background • Signal separation have received much attention. • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. 4 • Automatic music transcription • 3D audio system, etc. Applications Separate! Separate the target signal from multichannel signals with high accuracy.
  5. 5. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 5
  6. 6. • NMF can extract significant spectral patterns. – Basis matrix has frequently-appearing spectral patterns in . NMF [Lee, et al., 2001] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 6 Basis
  7. 7. • SNMF – Supervised spectral separation method Supervised NMF [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 7 Fixed Sample sound Target signal Other signalMixed signal
  8. 8. Problems of SNMF • SNMF is only for a single-channel signal – For multichannel signal, SNMF cannot use information between channels. • When many interference sources exist, separation performance of SNMF markedly degrades. 8 Separate Residual components
  9. 9. 9 • Multichannel NMF – is a natural extension of NMF for a multichannel signal – uses spatial information for the clustering of bases to achieve the unsupervised separation task. Multichannel NMF [Sawada, et al., 2013] Problems: Multichannel NMF involve strong dependence on initial values and lack robustness. Microphone array
  10. 10. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – Motivation and strategy – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 10
  11. 11. • Sawada’s multichannel NMF – is unified method to solve spatial and spectral separations. – Maximizes a likelihood: – For supervised situation, target spectral patterns is given. – Too much difficult to solve (lack robustness) – Computationally inefficient (much computational time) Motivation and strategy 11 Spatial direction of target signal Source components of all signals Target Other Observed spectrograms
  12. 12. • Proposed hybrid method – divides the problems as follows: – The spatial separation should be carried out with classical D.O.A. estimation methods. • These methods are very efficient and stable. – Divide and conquer method Motivation and strategy 12 Unsupervised spatial separation Supervised spectral separation Approximation Classical D.O.A. estimation SNMF-based method
  13. 13. Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method – k-means clustering (fast and stable) • Problems – Artificial distortion arises owing to the binary masking. 13 Right L R Center Left L R Center Binary masking Input signal (stereo) Separated signal 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 Frequency Time C C C R L R C L L L R R C C C C R R C R R L L L C C C C C C Frequency Time Binary maskSpectrogram Entry-wise product
  14. 14. Proposed method: hybrid separation • Hybrid separation method 14 Input stereo signal Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal L R
  15. 15. SNMF with spectrogram restoration : Holes Time Frequency Separated cluster Spectral holes (lost components) The proposed SNMF treats these holes as unseen observations Supervised basis … Extrapolate the fittest bases 15 (dictionary of target signal) Fix up
  16. 16. SNMF with spectrogram restoration Center RightLeft Direction sourcecomponent z (b) Center RightLeft Direction sourcecomponent (a) Target Center RightLeft Direction sourcecomponent (c) Extrapolated componentsFrequencyofFrequencyofFrequencyof After Input After signal directional clustering super- resolution- based SNMF Binary masking 16 Time FrequencyObserved spectrogram Target Interference Time Time Frequency Extrapolate Frequency Separated cluster Reconstructed data Supervised spectral bases Directional clustering SNMF with spectrogram restoration
  17. 17. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 17 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Cost function: : Binary masking matrix obtained from directional clustering
  18. 18. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 18 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Cost function: : Binary masking matrix obtained from directional clustering Binary index to exclude the holes
  19. 19. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 19 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Cost function: : Binary masking matrix obtained from directional clustering Binary index to exclude the holes
  20. 20. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 20 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Penalty term [Kitamura, et al. 2014] Cost function: : Binary masking matrix obtained from directional clustering Binary index to exclude the holes
  21. 21. • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence – IS-divergence Generalized divergence: b -divergence 21 The best criterion for signal separation [Kitamura, et al., 2014]
  22. 22. • We used two -divergences for the main cost and the regularization cost as and . Decomposition model and cost function 22 Decomposition model: Cost function: Supervised bases (Fixed)
  23. 23. Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . 23 Update rules:
  24. 24. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 24
  25. 25. • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Experimental condition 25 Center 1 2 3 4 Left Right Target source Supervision signal 24 notes that cover all the notes in the target melody Dataset Melody 1 Melody 2 Midrange Bass No. 1 Oboe Flute Piano Trombone No. 2 Trumpet Violin Harpsichord Fagotto No. 3 Horn Clarinet Piano Cello
  26. 26. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. Experimental result: closed data 26 Good Bad Conventional SNMF (single-channel SNMF) Proposed hybrid method Directional clustering Supervised Multichannel NMF [Sawada] KL-divergence EUC-distance
  27. 27. SNMF with spectrogram restoration • SNMF with spectrogram restoration has two tasks. • The optimal divergence for source separation is KL- divergence ( ). • In contrast, a divergence with higher value is suitable for the basis extrapolation. 27 Source separation SNMF with spectrogram restoration Basis extrapolation
  28. 28. Trade-off: separation and restoration • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] Sparseness: strong Sparseness: weak 28 Performance Separation Total performance of the hybrid method Restoration 0 1 2 3 4
  29. 29. • Closed data experiment – used different Tone generator for training and test signals Experimental condition 29 Supervision signal 24 notes that cover all the notes in the target melody Provided by Tone generator A Provided by Tone generator B (more real sound) + back ground noise (SNR = 10 dB) Center 1 2 3 4 Left Right Target source
  30. 30. 10 8 6 4 2 0 -2 -4 SDR[dB] 43210 bNMF • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. Experimental result: open data 30 Good Bad Conventional SNMF (single-channel SNMF) Proposed hybrid method Directional clustering Supervised Multichannel NMF [Sawada] KL-divergence EUC-distance
  31. 31. Conclusions • We proposed a hybrid multichannel signal separation method combining directional clustering and SNMF with spectrogram restoration. • There is a trade-off between separation and restoration abilities. 31 Thank you for your attention! You can hear a demonstration from my HP!

×