Report

Share

Follow

•1 like•624 views

•1 like•624 views

Report

Share

Download to read offline

Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference) Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Online divergence switching for superresolution-based nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.485-488, Hawaii, USA, March 2014 (Student Paper Award).

Follow

- 1. Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan) 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing Speech Analysis(2),2PM2-2
- 2. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Directional clustering – Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 2
- 3. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Directional clustering – Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 3
- 4. Research background • Music signal separation technologies have received much attention. • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • The separation performance of supervised NMF (SNMF) markedly degrades for the case of many source mixtures. 4 • Automatic music transcription • 3D audio system, etc. Applications We have been proposed a new hybrid separation method for stereo music signals. Separate!
- 5. Research background • Our proposed hybrid method 5 Input stereo signal Spatial separation method (Directional clustering) SNMF-based separation method (Superresolution-based SNMF) Separated signal L R
- 6. Research background • Optimal divergence criterion in superresolution-based SNMF depends on the spatial conditions of the input signal. • Our aim in this presentation 6 We propose a new optimal separation scheme for this hybrid method to separate the target signal with high accuracy for any types of the spatial condition.
- 7. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Directional clustering – Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 7
- 8. • NMF – is a sparse representation algorithm. – can extract significant features from the observed matrix. NMF [Lee, et al., 2001] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 8 Basis
- 9. Optimization in NMF • The variable matrices and are optimized by minimization of the divergence between and . • Euclidian distance (EUC-distance) and Kullbuck- Leibler divergence (KL-divergence) are often used for the divergence in the cost function. • In NMF-based separation, KL-divergence based cost function achieves high separation performance. 9 : Entries of variable matrices and , respectively. Cost function:
- 10. • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound – Decompose into the target signal and other signal SNMF [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 10Fixed Ex. Musical scale Target signal Other signalMixed signal
- 11. Five-source case Problem of SNMF • The separation performance of SNMF markedly degrades when many interference sources exist. 11 Separate Two-source case Separate Residual components
- 12. Directional clustering [Araki, et al., 2007] • Directional clustering – utilizes differences between channels as a separation cue. – Is equal to binary masking in the spectrogram domain. • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 12 Right L R Center Left L R Center Binary masking Input signal (stereo) Separated signal 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 Frequency Time C C C R L R C L L L R R C C C C R R C R R L L L C C C C C C Frequency Time Binary maskSpectrogram Entry-wise product
- 13. Hybrid method [D. Kitamura, et al., 2013] • We have proposed a new SNMF called superresolution-based SNMF and its hybrid method. • Hybrid method consists of directional clustering and superresolution-based SNMF. 13 Directional clustering L R Spatial separation Spectral separation Superresolution- based SNMF Hybrid method
- 14. Superresolution-based SNMF • This SNMF reconstructs the spectrogram obtained from directional clustering using supervised basis extrapolation. Time Frequency Separated cluster : Chasms Time Frequency Input spectrogram Other direction Time Frequency Reconstructed spectrogram 14 Target direction Directional clustering Superresolution- based SNMF
- 15. • Spectral chasms owing to directional clustering Superresolution-based SNMF 15 : Chasm Time Frequency Separated cluster Chasms Treat these chasms as an unseen observationsSupervised basis … Extrapolate the fittest bases
- 16. Superresolution-based SNMF Center RightLeft Direction sourcecomponent z (b) Center RightLeft Direction sourcecomponent (a) Target Center RightLeft Direction sourcecomponent (c) Extrapolated componentsFrequencyofFrequencyofFrequencyof After Input After signal directional clustering super- resolution- based SNMF Binary masking 16 Time FrequencyObserved spectrogram Target Interference Time Time Frequency Extrapolate Frequency Separated cluster Reconstructed data Supervised spectral bases Directional clustering Superresolution- based SNMF
- 17. • The divergence is defined at all grids except for the chasms by using the index matrix . Decomposition model and cost function 17 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Penalty term Cost function: : Index matrix obtained from directional clustering
- 18. Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . 18 Update rules:
- 19. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Directional clustering – Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 19
- 20. Consideration for optimal divergence • Separation performance of conventional SNMF • Superresolution-based SNMF – Optimal divergence depends on the amount of spectral chasms. 20 KL-divergence EUC-distance KL-divergence EUC-distance? However…
- 21. Consideration for optimal divergence • Superresolution-based SNMF has two tasks. • Abilities of each divergence 21 Signal separation Basis extrapolation Superresolution- based SNMF Signal separation Basis extrapolation KL-divergence (Very good) (Poor) EUC-distance (Good) (Good)
- 22. Consideration for optimal divergence • Spectrum decomposed by NMF with KL-divergence tends to become sparse compared with that decomposed by NMF with EUC-distance. • Sparse basis is not suitable for extrapolating using observable data. 22 -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] KL-divergence EUC-distance
- 23. Consideration for optimal divergence • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms because of the trade-off between separation and extrapolation abilities.Performance Separation Total performance Extrapolation Anti-sparseSparse -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] Sparseness: Weak 23 KL-divergence EUC-distance Strong
- 24. • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms. Consideration for optimal divergence 24 Time Frequency : Chasms Time Frequency : Chasms If there are many chasms If the chasms are not exist The extrapolation ability is required. The separation ability is required. KL-divergence should be used. EUC-distance should be used.
- 25. Hybrid method for online input data • When we consider applying the hybrid method to online input data… 25 Online binary-masked spectrogram Frequency Time Observed spectrogramDirectional clustering Binary mask
- 26. Hybrid method for online input data • We divide the online spectrogram into some block parts. 26 Frequency Time Superresolution- based SNMF Superresolution- based SNMF Superresolution- based SNMF In parallel
- 27. Online divergence switching • We calculate the rate of chasms in each block part. 27 There are many chasms. The chasms are not exist so much. Superresolution- based SNMF with KL-divergence Superresolution- based SNMF with EUC-distance Threshold value Threshold value
- 28. Procedure of proposed method 28
- 29. Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Directional clustering – Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 29
- 30. Experimental conditions • We used stereo-panning signals. • Mixture of four instruments generated by MIDI synthesizer • We used the same type of MIDI sounds of the target instruments as supervision for training process. 30 Center １ ２ ３ ４ Left Right Target source Supervision sound Two octave notes that cover all the notes of the target signal
- 31. Experimental conditions • We compared three methods. – Hybrid method using only EUC-distance-based SNMF (Conventional method 1) – Hybrid method using only KL-divergence-based SNMF (Conventional method 2) – Proposed hybrid method that switches the divergence to the optimal one (Proposed method) • We used signal-to-distortion ratio (SDR) as an evaluation score. – SDR indicates the total separation accuracy, which includes both of quality of separated target signal and degree of separation. 31
- 32. Experimental result • Average SDR scores for each method, where the four instruments are shuffled with 12 combinations. • Proposed method outperforms other methods. 32 GoodBad 8.0 8.5 9.0 9.5 10.0 SDR [dB] Conventional method 1 Conventional method 2 Proposed method
- 33. Conclusions • We propose a new divergence switching scheme for superresolution-based SNMF. • This method is for the online input signal to separate using optimal divergence in NMF. • The proposed method can be used for any types of the spatial condition of sources, and separates the target signal with high accuracy. 33 Thank you for your attention!

- Good afternoon everyone, // I’m Daichi Kitamura from Nara institute of science and technology, Japan. Today // I’d like to talk about Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization
- This is outline of my talk.
- First, // I talk about research background.
- Recently, // music signal separation technologies have received much attention. These technologies are available for many applications, such as an automatic transcription, 3D audio system, and so on. / Music signal separation / based on nonnegative matrix factorization, // NMF in short, // has been a very active area of the research. Particularly, supervised NMF, / SNMF in short, / can separate the target signal with high accuracy. However, // for the case of many source mixtures / such as more realistic musical tunes, / the separation performance markedly degrades. To solve this problem, // we have been proposed a new hybrid separation method for stereo music signals.
- Our proposed hybrid method concatenates spatial separation method called directional clustering / and SNMF based separation method called superresolution-based SNMF. In this hybrid method, first, the target direction is separated by the directional clustering. Then, target signal is separated by this SNMF.
- In previous studies, / we confirmed that / the optimal divergence criterion in superresolution-based SNMF / depends on the spatial conditions of the input signal. In this presentation, / we propose a new optimal separation scheme for this hybrid method / to separate the target signal with high performance / for any types of the spatial condition.
- Next, // I talk about conventional methods.
- As a means of extracting some features from the spectrogram, / NMF has been proposed. This is a sparse representation algorithm, and this method can extract the significant features from the observed matrix. NMF decomposes the observed spectrogram Y, / into two nonnegative matrices F and G, approximately. (アポロークシメイトリ) Here, first decomposed matrix F / has frequently-appearing spectral patterns / as a basis. And another decomposed matrix G / has time-varying gains / of each spectral pattern. So, the matrix F is called ‘basis matrix,’ / and the matrix G is called ‘activation matrix.’
- In NMF decomposition, the variable matrices F and G are optimized / by minimization of the divergence between input data Y and reconstructed data FG. This is the cost function in NMF. We can optimize the variable matrices F and G by the minimization of this cost function. Here, Euclidian distance and KL-divergence are often used for the divergence in the cost function. In NMF based signal separation, KL-divergence based cost function / achieves high separation performance / because of the sparseness in music spectrogram.
- To separate the target signal using NMF, SNMF has been proposed. SNMF utilizes some sample sounds of the target signal / as a supervision signal. For example, / if we wanted to separate the piano signal from this mixed signal, / the musical scale sound of the same piano / should be used as a supervision. This sample sound is decomposed by simple NMF, / and the supervised basis matrix F is constructed in the training process. Then, the mixed signal is decomposed in the separation process / using the supervised bases F, / as FG+HU. The matrix F is fixed, / and the other matrices G, H, and U are optimized. Finally, the target piano signal is separated as FG, / and the other signals are separated as HU.
- SNMF can extract the target signal / when the number of mixed signal is small. However, for the case of many interfering sources exist, / the separation performance markedly degrades.
- Next, // I explain about directional clustering method. This method utilizes differences between left and right channels as a separation cue. And this is equal to binary masking in the spectrogram domain. However, this method cannot separate the sources in the same direction / like this. In addition, the separated signal has an artificial distortion owing to the binary masking.
- To solve these problems of SNMF and Directional clustering, / we have proposed a new SNMF called “superresolution-based SNMF” / and its hybrid method. This hybrid method consists of two techniques, namely, directional clustering and superresolution-based SNMF. First, / directional clustering is applied to the input stereo signal / to separate the target direction. Then, / the target signal is separated by this SNMF.
- Here, / the separated spectrogram by directional clustering / has many spectral chasms / like this. This is due to the binary masking in directional clustering. But, our superresolution-based SNMF can reconstruct such damaged spectrogram using supervised basis extrapolation.
- This spectrum is obtained by directional clustering. There are many spectral chasms owing to the binary masking. Superresolution-based SNMF treats these chasms as an unseen observations like this, / and extrapolates the fittest target basis / from the supervised bases F. As a result, the lost components are reconstructed by the supervised basis extrapolation.
- This figure shows the directional distribution of the input stereo signal. The target source is in the center direction, and other interfering sources are distributed like this. After directional clustering, / left and right source components / leak in the center cluster, // and center sources lose some of their components. These lost components / correspond to the spectral chasms in the spectrogram domain. And after superresolution-based SNMF, the target components are separated / and restored using supervised bases of the target sound trained in advance. In other words, / the resolution of the target spectrogram / is recovered with the superresolution / by the supervised basis extrapolation.
- This is a decomposition model of superresolution-based SNMF. It is the same as that in the conventional SNMF. And, this equation is the cost function. In this cost function, / the divergence is defined at all spectrogram grids / except for the spectral chasms / by using the index matrix I obtained from directional clustering. For the grids of the chasms, we impose a regularization term for superresolution.
- From the minimization of the cost function, / we can obtain the update rules / for the optimization of variable matrices G, H, and U.
- Next, I talk about proposed method.
- In conventional SNMF, KL-divergence-based SNMF always achieves high separation performance / rather than Euclidian-distance-based SNMF. However, in superresolution-based SNMF, / the optimal divergence depends on the amount of spectral chasms.
- This is because superresolution-based SNMF has two tasks, / namely, the signal separation / and the basis extrapolation for the superresolution of damaged spectrogram. KL-divergence can separate signals with high accuracy, but it’s not suitable for the basis extrapolation. On the other hand, Euclidian distance is good for the basis extrapolation.
- The spectrum decomposed by NMF with KL-divergence / tends to become sparse / compared with that decomposed by NMF with EUC-distance. And, such sparse basis is not suitable for extrapolating / using observable data.
- Therefore, The optimal divergence for superresolution-based SNMF / depends on the amount of spectral chasms because of the trade-off / between separation and extrapolation abilities.
- From these properties, if there are many chasms, EUC-distance should be used because the extrapolation ability is required. On the other hand, if the chasms are not exist so much, KL-divergence should be used because the separation ability is required.
- When we consider applying our hybrid method to online input data, / we can obtain online binary-masked spectrogram from the directional clustering..
- And, we propose to divide this online spectrogram into some block parts. Then, superresolution-based SNMF is applied to each blocked spectrogram in parallel.
- Here, we can calculate the rate of chasms r in each block part, / and decide the divergence using threshold value tau. For example, this blocked spectrogram Y(1) doesn’t have the chasms so much. So, KL-divergence is suitable because the separation ability is required. And, next blocked part Y(2) has many spectral chasms. So, Euclidian-distance is suitable / because we have to reconstruct this damaged spectrogram by the superresolution.
- This is a procedure of the proposed divergence switching method, / where the supervised bases for both of Euclidian-distance and KL-divergence, F(EUC) and F(KL) / should be prepared in advance using supervision sound of the target signal.
- Next, I talk about Experiments.
- To confirm the effectiveness of the proposed divergence switching method, / we conduct an evaluation experiment. In this experiment, we used stereo-panning music signals. This stereo signal has four instrumental sources, and the target source is always located in the center direction. Left and right side interfering sources are located in 15-degree in the fist half, / and these sources are moved in the center direction as theta equals 0 / in the last half. Therefore, many chasms were produced by directional clustering in the first half / compared with the last half. The signal contains 4 instruments, namely, oboe, flute, trombone, and piano, / generated by MIDI synthesizer. These sources are mixed as the same power. In addition, / we used the same type of MIDI sounds of the target instruments / as the supervision sound / like this (pointing supervision score). This supervision sound consists of two octave notes that cover all notes of the target signal.
- In this experiment, we compared three methods, / namely, Hybrid method using only EUC-distance based SNMF, / Hybrid method using only KL-divergence based SNMF, / and the proposed hybrid method that switches the divergence to the optimal one. In addition, we used signal-to-distortion ratio / as an evaluation score. SDR indicates the total separation accuracy, / which includes both / quality of separated target signal / and degree of separation.
- This result is an average of evaluation scores for all combinations of the input signals. From this result, proposed hybrid method outperforms other methods. This is the efficacy of the optimal divergence switching.
- This is conclusions of my talk. Thank you for your attention.
- Conventional hybrid method is a simple method that concatenates normal SNMF and directional clustering. So, this method cannot reconstruct the lost components, spectral chasms. This proposed method, red line, is fixed the divergence. So, we already confirmed that the divergence-switching method achieves better result than this red line, in the previous result.
- Directional clustering utilizes some clustering methods, such as K-means clustering. The feature of the clustering / is the differences of the amplitude between channels, namely, the direction of the sources. From the clustering result, we can obtain binary mask matrix. So, the separation is achieved by the production of the input spectrogram and this mask.
- As another means of addressing multichannel signal separation, Multichannel NMF also has been proposed by Ozerov and Sawada. This method is a natural extension of NMF, and uses spectral and spatial cues. But, this unified method is very difficult optimization problem mathematically / because many variables should be optimized by one cost function. So, this method strongly depends on the initial values.
- If the target sources increase in the same direction with target instruments, the separation performance of supervised NMF markedly degrades. This is because, the several resemble bases arise in both of the target and other instruments.
- If the left and right sources close to the center direction, the separation ↓ become difficult, because directional clustering cannot separate well. In addition, bases extrapolation also become difficult because the number of chasms in the separated cluster / are increased in this case. In contrast, if the theta become larger, the separation ↓ become easy.
- This is a signal flow of the proposed hybrid method. In our experiment, superresolution-based supervised NMF is applied to only the center direction because the target source is located in the center direction. However, if the target source is located in the left or right side, we should apply this NMF to the direction that have the target source whether or not there is the other source in that direction.
- The optimization of variables F and G in NMF / is based on the minimization of the cost function. The cost function is defined as the divergence between observed spectrogram Y / and reconstructed spectrogram FG. This minimization is an inequality constrained optimization problem.
- SDR is the total evaluation score as the performance of separation.