- 1. Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan) 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays Oral session 2 – Microphone array processing
- 2. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 2
- 3. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 3
- 4. Research background • Signal separation have received much attention. • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. 4 • Automatic music transcription • 3D audio system, etc. Applications Separate! We have proposed a new SNMF and its hybrid separation method for multichannel signals.
- 5. Research background • Our proposed hybrid method 5 Input stereo signal Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal L R
- 6. Research background • Divergence criterion in SNMF strongly affects separation performance. – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) • The optimal divergence for SNMF with spectrogram restoration is not apparent. 6 We extend our new SNMF to a more generalized form. We give a theoretical analysis for the optimization of the divergence.
- 7. Outline • 1. Research background • 2. Conventional methods – Directional clustering – NMF – Supervised NMF – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 7 Stereo signal Spatial separation Spectral separation Separated signal Hybrid method
- 8. Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 8 Right L R Center Left L R Center Binary masking Input signal (stereo) Separated signal 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 Frequency Time C C C R L R C L L L R R C C C C R R C R R L L L C C C C C C Frequency Time Binary maskSpectrogram Entry-wise product
- 9. • NMF can extract significant spectral patterns. – Basis matrix has frequently-appearing spectral patterns in . NMF [Lee, et al., 2001] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 9 Basis
- 10. Divergence criterion in NMF • Cost function in NMF – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) 10 : Entries of variable matrices and , respectively.
- 11. • SNMF – Supervised spectral separation method Supervised NMF [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 11 Fixed Sample sound Target signal Other signalMixed signal
- 12. Hybrid method [Kitamura, et al., 2013] • We have proposed a new SNMF called SNMF with spectrogram restoration and its hybrid method. 12 Directional clustering L R Spatial separation Spectral separation SNMF with spectrogram restoration Hybrid method
- 13. SNMF with spectrogram restoration • SNMF with spectrogram restoration can separate the target and restore the spectrogram simultaneously. 13 : Hole Time Frequency Spectrogram after directional clustering Time Frequency After SNMF with spectrogram restoration Non-target Target Non-target Target Supervised bases (Dictionary of the target)
- 14. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 14 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Penalty term Cost function: : Binary masking matrix obtained from directional clustering
- 15. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 15
- 16. • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence – IS-divergence Generalized divergence: b -divergence 16
- 17. • We introduced -divergence to extend the cost function as a generalized form. Decomposition model and cost function 17 Decomposition model: Supervised bases (Fixed)Cost function:
- 18. Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . 18 Update rules:
- 19. SNMF with spectrogram restoration • This SNMF has two tasks. • The optimal divergence for source separation has been investigated. – KL-divergence ( ) is suitable for source separation. • No one investigates about the optimal divergence for basis extrapolation. • We analyze the optimal divergence for basis extrapolation based on a generation model in NMF. 19 Source separation SNMF with spectrogram restoration Basis extrapolation
- 20. • The decomposition of NMF is equivalent to a maximum likelihood estimation, which assumes the generation model of the input data , implicitly. Analysis of extrapolation ability 20 Cost function in NMF: Exponential dist. Poisson dist. Gaussian dist. : Maximum of data IS-divergence KL-divergence EUC-distance
- 21. • To compare net extrapolation ability, we generate a random data , which obey each generation model. • Also, we prepare the binary-masked random data , and attempt to restore that. Analysis of extrapolation ability 21 Restoration 100 bases is created. Training
- 22. • Binary mask was randomly generated. – We generate two types of binary mask whose densities of holes are 75% and 98%. • SAR indicates the accuracy of restoration Analysis of extrapolation ability 22 Input random data Binary-masked data Restored data Binary masking Restoration [dB] Entry-wise square
- 23. Results of restoration analysis • Simulated result of the restoration ability • The optimal divergence for the basis extrapolation (restoration) is around ! 23 25 20 15 10 5 0 SAR[dB] 43210 bNMF 25 20 15 10 5 0 SAR[dB] 43210 bNMF breg=0 breg=1 breg=2 breg=3 breg=0 breg=1 breg=2 breg=3 Optimal divergence for source separation (KL-divergence) Good Bad 75%-binary-masked 98%-binary-masked
- 24. Trade-off between separation and restoration • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] Sparseness: strong Sparseness: weak 24 Performance Separation Total performance of the hybrid method Restoration 0 1 2 3 4
- 25. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 25
- 26. • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Experimental condition 26 Center １ ２ ３ ４ Left Right Target source Supervision signal 24 notes that cover all the notes in the target melody Dataset Melody 1 Melody 2 Midrange Bass No. 1 Oboe Flute Piano Trombone No. 2 Trumpet Violin Harpsichord Fagotto No. 3 Horn Clarinet Piano Cello
- 27. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. Experimental result 27 Good Bad Conventional SNMF Proposed hybrid method ( ) Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Unsupervised method Supervised method Multichannel NMF is an integrated method.
- 28. Experiment for real-recorded signal • We recorded a binaural signal using dummy head • Reverberation time: – 200 ms • The other conditions are the same as those in the previous instantaneous mixture signal. 28 1 Center Right 4 2 3 Left Dummy head 1.5 m 1.5 m 1.5 m 2.5 m Target signal
- 29. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Result for real-recorded signals Experimental result 29 Good Bad Conventional SNMF Proposed hybrid method ( ) Unsupervised method Supervised method Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Multichannel NMF is an integrated method.
- 30. Conclusions • Restoration requires anti-sparse criterion ( b = 3 ) • There is a trade-off between separation and restoration abilities • Optimal divergence is EUC-distance for SNMF with spectrogram restoration – whereas KL-divergence is the best for conventional SNMF. 30 Thank you for your attention!

- This is outline of my talk.
- First, // I talk about research background.
- Recently, // signal separation technologies have received much attention. These technologies are available for many applications, such as an automatic transcription, 3D audio system, and so on. / Music signal separation / based on nonnegative matrix factorization, // NMF in short, // has been a very active area of the research. Particularly, supervised NMF (SNMF) / achieves the highest separation performance. However, SNMF can be used for only single-channel signal. If we could use the multichannel information, we can improve the performance more than ever. To improve its performance, SNMF-based multichannel signal separation method is required.
- Our proposed hybrid method concatenates spatial separation method called directional clustering / and SNMF based separation method called SNMF with spectrogram restoration. In this hybrid method, first, the target direction is separated by the directional clustering. Then, target signal is separated by this SNMF.
- In previous studies, / we confirmed that / the divergence criterion in SNMF / strongly affects separation performance. For the source separation, KL-divergence criterion is often used and achieves the highest separation performance. However, the optimal divergence for our new SNMF with spectrogram restoration is not apparent. Therefore, in this presentation, we extend this method to a more generalized form. In addition, we will give a theoretical analysis for the optimization of the divergence / to achieve the highest separation performance.
- Next, // I talk about conventional methods.
- Directional clustering is an unsupervised spatial separation method. This method utilizes differences between left and right channels as a clustering cue. So, we can separate the sources direction-wisely. And this is equal to binary masking in the spectrogram domain. So, we can obtain the binary mask from the result of clustering, and we have an entry-wise product. Then we can obtain the separated signal. However, this method cannot separate the sources in the same direction / like this. In addition, the separated signal has an artificial distortion owing to the binary masking.
- Next method is NMF. NMF is a powerful method for extracting significant features from a spectrogram. NMF decomposes the input spectrogram Y into a product of basis matrix F and activation matrix G, where basis matrix F / has frequently-appearing spectral patterns / as basis vectors like this, and activation matrix G / has time-varying gains / of each basis vector.
- In NMF decomposition, the cost function is defined as a distance or a divergence between input matrix Y and decomposed matrix FG. This equation indicates the cost function in NMF, and we minimize this to find F and G. These 3 criteria / are often used in NMF decomposition, and KL-divergence is the best one for the acoustic signal separation.
- To separate the target signal using NMF, Supervised NMF has been proposed. SNMF is a supervised spectral separation method. In SNMF, first, we train the sample sound of the target signal, which is like a musical scale. Then we construct the supervised basis F. This is a spectral dictionary of the target sound. Next, we separate the mixed signal / using the supervised basis F, as FG+HU. Therefore, the target signal obtained as FG, and the other signal is reconstructed by HU. This method can separate the target signal well, but this method can be used for only the signal-channel signal.
- To apply the SNMF-based method to multichannel signal, / we have proposed a new SNMF called “SNMF with spectrogram restoration” / and its hybrid method. In this hybrid method, / first, / directional clustering is applied to the input stereo signal / to separate the target direction. Then, / the target signal is separated by this new SNMF.
- Here, / the separated spectrogram by directional clustering / has many spectral holes / like this. This is due to the binary masking in directional clustering. But, our new SNMF can restore such damaged spectrogram / by using a spectral dictionary of the target sound, namely, this SNMF can extrapolate the supervised basis F. Simultaneously, the non-target signal is separated.
- This is a decomposition model of SNMF with spectrogram restoration. And, this equation is the cost function. In this cost function, / the divergence is defined at all spectrogram grids / except for the spectral holes / by using the binary mask I. For the grids of the holes, we impose a regularization term to avoid the extrapolation error. In previous studies, we used EUC-distance and KL-divergence in this cost function. In this presentation, we introduce a generalized divergence to this and extend this method.
- Next, I talk about Analysis of restoration ability
- In the extension, we introduce a generalized divergence function called beta-divergence. This function has a parameter beta, and includes EUC-distance, KL-divergence, and IS-divergence when beta equals 2, 1, and 0 respectively.
- By using beta-divergence, we can extend the cost function to more generalized form. This cost function includes EUC-distance, KL-divergence and so on.
- From the minimization of the cost function, / we can obtain the update rules / for the optimization of variable matrices G, H, and U.
- This SNMF has two tasks, namely, Separation of the target signal / and basis extrapolation for the restoration of the spectrogram, where the optimal divergence for source separation has been investigated by many researchers. And it is clarified that the KL-divergence is suitable for source separation. But nobody investigates about the optimal divergence for the basis extrapolation. So, we analyze the optimal one / based on a generation model in NMF.
- The decomposition of NMF is equivalent to a maximum likelihood estimation, / which assumes the generation model of the input data Y, implicitly. If we select the parameter beta, / the assumption of generation model is fixed. In other words, the parameter beta defines the generation model of the input data.
- In this analysis, to compare the net extrapolation ability, we generated a random input data Y, which obey each generation model. Also, we prepared the binary-masked random data YI, and attempt to restore that. In a training process, we construct the supervised basis F using the random data Y. Then we attempt to restore the binary-masked data using the trained basis F.
- The binary mask I was generated by uniform manner, and we generated two types of binary masks / whose densities of holes are 75% and 98%. Therefore, by calculating the similarity between input data Y and restored data FG, / we can evaluate the extrapolation ability and the accuracy of restoration. So SAR indicates the accuracy of restoration.
- These are the results of analysis. The left one is the result for 75%-binary-masked data, and the right one is 98%-binary masked data. Beta equals 1 is the optimal divergence for source separation, which means KL-divergence. But, surprisingly, the optimal divergence for the restoration is that / beta equals around 3.
- Therefore, the optimal divergence for the hybrid method is around EUC-distance / because of the trade-off between separation and restoration abilities / like this figure. This is because, the sparse basis is not suitable for the extrapolation using only the observable data.
- Next, I talk about Experiments.
- This is an experimental condition. The mixed signal includes four melodies. Each sound source located like this figure. The target source is always located in the center direction / with other interfering source. And we prepared 3 compositions of instruments and evaluated the average score of 36 patterns. In addition, the supervision signal has 24 notes like this score, which cover all the notes in the target melody.
- This is a result of experiment. We showed the average SDR score, where SDR indicates the total quality of the separation. Directional clustering cannot separate the sources in the same direction, so the result was not good. Multichannel NMF is an integrated method proposed by Sawada. This method utilizes an integrated cost function, which includes spatial and spectral separations simultaneously. But this method is quite difficult optimization problem because many variables should be optimized by using only one cost function. So, this method strongly depends on the initial value, and the average score becomes bad. The conventional SNMF achieves the highest score when beta equals 1, KL-divergence. But, the optimal divergence of our hybrid method was 2 because of the trade-off between separation and restoration abilities.
- Also we conducted an experiment using real-recorded signals. In this experiment, the binaural mixed signal was recorded in the real environment. The other conditions are the same as those in the previous experiment.
- This is a result of the experiment using real-recorded signal. From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
- This is conclusions of my talk. Thank you for your attention.
- その他の実験条件はこのようになっています． NMFの距離規範βNMFを0から4まで変化させた時のすべての組み合わせの評価値を比較します． 正則化の距離規範においてはもっとも性能の高いβreg=1のみを示しております． 評価値にはSDRを用いております． SDRは分離度合と人工歪の少なさを含む総合的な分離精度です．
- Supervised method has an inherent problem. That is, we cannot get the perfect supervision sound of the target signal. Even if the supervision sounds are the same type of instrument as the target sound, / these sounds differ / according to various conditions. For example, individual styles of playing / and the timbre individuality for each instrument, and so on. When we want to separate this piano sound from mixed signal, / maybe we can only prepare the similar piano sound, but the timbre is slightly different. However the supervised NMF cannot separate because of the difference of spectra of the target sound.
- To solve this problem, we have proposed a new supervised method / that adapts the supervised bases to the target spectra / by a basis deformation. This is the decomposition model in this method. We introduce the deformable term, / which has both positive and negative values like this. Then we optimize the matrices D, G, H, and U. This figure indicates spectral difference between the real sound and artificial sound.
- This figure shows the directional distribution of the input stereo signal. The target source is in the center direction, and other interfering sources are distributed like this. After directional clustering, / left and right source components / leak in the center cluster, // and center sources lose some of their components. These lost components / correspond to the spectral chasms in the spectrogram domain. And after SNMF with spectrogram restoration, the target components are separated / and restored using supervised bases of the target sound trained in advance. In other words, / the resolution of the target spectrogram / is recovered with the superresolution / by the supervised basis extrapolation.
- As another means of addressing multichannel signal separation, Multichannel NMF also has been proposed by Ozerov and Sawada. This method is a natural extension of NMF, and uses spectral and spatial cues. But, this unified method is very difficult optimization problem mathematically / because many variables should be optimized by one cost function. So, this method strongly depends on the initial values.
- This SNMF is for a single-channel signal. Therefore we cannot use the information about correlation between channels. However, almost all music signals are the stereo format. So we should extend SNMF for a multichannel signal. In addition, when many interfering sources exist, the separation performance of SNMF markedly degrades. This is because Many spectral patterns arise / with similar to the target sound.
- Nonnegative matrix factorization is a very powerful and useful method / for extracting significant features in the input matrix. NMF decomposes the input nonnegative matrix Y / into two matrices F and G like this, // where F and G cannot have the negative entries. Therefore, all the entries in Y, F, and G / are nonnegative. In addition, K is usually set smaller value than Ω and T, / so this is a kind of low-rank approximation. This nonnegative constraint and dimensional reduction result that / the basis matrix has distinctive components in the observed matrix.
- The optimization of variables F and G in NMF / is based on the minimization of the cost function. The cost function is defined as the divergence between observed spectrogram Y / and reconstructed spectrogram FG. This minimization is an inequality constrained optimization problem.
- This is a result of the experiment using real-recorded signal. From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
- This spectrum is obtained by directional clustering. There are many spectral chasms owing to the binary masking. SNMF with spectrogram restoration / treats these chasms as an unseen observations like this, / and extrapolates the fittest target basis / from the supervised bases F. As a result, the lost components are restored by the supervised basis extrapolation.
- SDR is the total evaluation score as the performance of separation.