Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Divergence optimization in nonnegative matrix
factorization with spectrogram restoration for
multichannel signal separatio...
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– S...
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– S...
Research background
• Signal separation have received much attention.
• Music signal separation based on nonnegative matri...
Research background
• Our proposed hybrid method
5
Input stereo signal
Spatial separation method
(Directional clustering)
...
Research background
• Divergence criterion in SNMF strongly affects
separation performance.
– Euclidian distance (EUC-dist...
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– NMF
– Supervised NMF
– Hybrid method...
Directional clustering [Araki, et al., 2007]
• Directional clustering
– Unsupervised spatial separation method
• Problems
...
• NMF can extract significant spectral patterns.
– Basis matrix has frequently-appearing spectral patterns
in .
NMF [Lee, ...
Divergence criterion in NMF
• Cost function in NMF
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-d...
• SNMF
– Supervised spectral separation method
Supervised NMF [Smaragdis, et al., 2007]
Separation process Optimize
Traini...
Hybrid method [Kitamura, et al., 2013]
• We have proposed a new SNMF called SNMF with
spectrogram restoration and its hybr...
SNMF with spectrogram restoration
• SNMF with spectrogram restoration can separate the
target and restore the spectrogram ...
• The divergence is defined at all grids except for the
holes by using the Binary mask matrix .
Decomposition model and co...
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– S...
• : -divergence [Eguchi, et al., 2001]
– EUC-distance
– KL-divergence
– IS-divergence
Generalized divergence: b -divergenc...
• We introduced -divergence to extend the cost
function as a generalized form.
Decomposition model and cost function
17
De...
Update rules
• We can obtain the update rules for the optimization of
the variables matrices , , and .
18
Update rules:
SNMF with spectrogram restoration
• This SNMF has two tasks.
• The optimal divergence for source separation has
been inves...
• The decomposition of NMF is equivalent to a
maximum likelihood estimation, which assumes the
generation model of the inp...
• To compare net extrapolation ability, we generate a
random data , which obey each generation model.
• Also, we prepare t...
• Binary mask was randomly generated.
– We generate two types of binary mask whose densities of
holes are 75% and 98%.
• S...
Results of restoration analysis
• Simulated result of the restoration ability
• The optimal divergence for the basis extra...
Trade-off between separation and restoration
• The optimal divergence for SNMF with spectrogram
restoration and its hybrid...
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– S...
• Mixed signal includes four melodies (sources).
• Three compositions of instruments
– We evaluated the average score of 3...
14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Signal-to-distortion ratio (SDR)
– total quality of the separation, which includes...
Experiment for real-recorded signal
• We recorded a binaural signal using dummy head
• Reverberation time:
– 200 ms
• The ...
14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Result for real-recorded signals
Experimental result
29
Good
Bad
Conventional SNMF...
Conclusions
• Restoration requires anti-sparse criterion ( b = 3 )
• There is a trade-off between separation and
restorati...
Upcoming SlideShare
Loading in …5
×

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

633 views

Published on

Presented at 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation," Proceedings of 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), pp.92-96, Nancy, France, May 2014.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

  1. 1. Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan) 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays Oral session 2 – Microphone array processing
  2. 2. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 2
  3. 3. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 3
  4. 4. Research background • Signal separation have received much attention. • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. 4 • Automatic music transcription • 3D audio system, etc. Applications Separate! We have proposed a new SNMF and its hybrid separation method for multichannel signals.
  5. 5. Research background • Our proposed hybrid method 5 Input stereo signal Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal L R
  6. 6. Research background • Divergence criterion in SNMF strongly affects separation performance. – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) • The optimal divergence for SNMF with spectrogram restoration is not apparent. 6 We extend our new SNMF to a more generalized form. We give a theoretical analysis for the optimization of the divergence.
  7. 7. Outline • 1. Research background • 2. Conventional methods – Directional clustering – NMF – Supervised NMF – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 7 Stereo signal Spatial separation Spectral separation Separated signal Hybrid method
  8. 8. Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 8 Right L R Center Left L R Center Binary masking Input signal (stereo) Separated signal 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 Frequency Time C C C R L R C L L L R R C C C C R R C R R L L L C C C C C C Frequency Time Binary maskSpectrogram Entry-wise product
  9. 9. • NMF can extract significant spectral patterns. – Basis matrix has frequently-appearing spectral patterns in . NMF [Lee, et al., 2001] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 9 Basis
  10. 10. Divergence criterion in NMF • Cost function in NMF – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) 10 : Entries of variable matrices and , respectively.
  11. 11. • SNMF – Supervised spectral separation method Supervised NMF [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 11 Fixed Sample sound Target signal Other signalMixed signal
  12. 12. Hybrid method [Kitamura, et al., 2013] • We have proposed a new SNMF called SNMF with spectrogram restoration and its hybrid method. 12 Directional clustering L R Spatial separation Spectral separation SNMF with spectrogram restoration Hybrid method
  13. 13. SNMF with spectrogram restoration • SNMF with spectrogram restoration can separate the target and restore the spectrogram simultaneously. 13 : Hole Time Frequency Spectrogram after directional clustering Time Frequency After SNMF with spectrogram restoration Non-target Target Non-target Target Supervised bases (Dictionary of the target)
  14. 14. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 14 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Penalty term Cost function: : Binary masking matrix obtained from directional clustering
  15. 15. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 15
  16. 16. • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence – IS-divergence Generalized divergence: b -divergence 16
  17. 17. • We introduced -divergence to extend the cost function as a generalized form. Decomposition model and cost function 17 Decomposition model: Supervised bases (Fixed)Cost function:
  18. 18. Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . 18 Update rules:
  19. 19. SNMF with spectrogram restoration • This SNMF has two tasks. • The optimal divergence for source separation has been investigated. – KL-divergence ( ) is suitable for source separation. • No one investigates about the optimal divergence for basis extrapolation. • We analyze the optimal divergence for basis extrapolation based on a generation model in NMF. 19 Source separation SNMF with spectrogram restoration Basis extrapolation
  20. 20. • The decomposition of NMF is equivalent to a maximum likelihood estimation, which assumes the generation model of the input data , implicitly. Analysis of extrapolation ability 20 Cost function in NMF: Exponential dist. Poisson dist. Gaussian dist. : Maximum of data IS-divergence KL-divergence EUC-distance
  21. 21. • To compare net extrapolation ability, we generate a random data , which obey each generation model. • Also, we prepare the binary-masked random data , and attempt to restore that. Analysis of extrapolation ability 21 Restoration 100 bases is created. Training
  22. 22. • Binary mask was randomly generated. – We generate two types of binary mask whose densities of holes are 75% and 98%. • SAR indicates the accuracy of restoration Analysis of extrapolation ability 22 Input random data Binary-masked data Restored data Binary masking Restoration [dB] Entry-wise square
  23. 23. Results of restoration analysis • Simulated result of the restoration ability • The optimal divergence for the basis extrapolation (restoration) is around ! 23 25 20 15 10 5 0 SAR[dB] 43210 bNMF 25 20 15 10 5 0 SAR[dB] 43210 bNMF breg=0 breg=1 breg=2 breg=3 breg=0 breg=1 breg=2 breg=3 Optimal divergence for source separation (KL-divergence) Good Bad 75%-binary-masked 98%-binary-masked
  24. 24. Trade-off between separation and restoration • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] Sparseness: strong Sparseness: weak 24 Performance Separation Total performance of the hybrid method Restoration 0 1 2 3 4
  25. 25. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 25
  26. 26. • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Experimental condition 26 Center 1 2 3 4 Left Right Target source Supervision signal 24 notes that cover all the notes in the target melody Dataset Melody 1 Melody 2 Midrange Bass No. 1 Oboe Flute Piano Trombone No. 2 Trumpet Violin Harpsichord Fagotto No. 3 Horn Clarinet Piano Cello
  27. 27. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. Experimental result 27 Good Bad Conventional SNMF Proposed hybrid method ( ) Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Unsupervised method Supervised method Multichannel NMF is an integrated method.
  28. 28. Experiment for real-recorded signal • We recorded a binaural signal using dummy head • Reverberation time: – 200 ms • The other conditions are the same as those in the previous instantaneous mixture signal. 28 1 Center Right 4 2 3 Left Dummy head 1.5 m 1.5 m 1.5 m 2.5 m Target signal
  29. 29. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Result for real-recorded signals Experimental result 29 Good Bad Conventional SNMF Proposed hybrid method ( ) Unsupervised method Supervised method Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Multichannel NMF is an integrated method.
  30. 30. Conclusions • Restoration requires anti-sparse criterion ( b = 3 ) • There is a trade-off between separation and restoration abilities • Optimal divergence is EUC-distance for SNMF with spectrogram restoration – whereas KL-divergence is the best for conventional SNMF. 30 Thank you for your attention!

×