Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Nonnegative Matrix Factorization by Tatsuya Yokota 3584 views
- China dredging engineering industry... by Qianzhan Intellig... 295 views
- China banking industry market resea... by Qianzhan Intellig... 69 views
- Read Cheyenne Way: Conflict and Cas... by Jessica D Lenoir 48 views
- NOTICIAS que vienen para el 2016 by Christian Pino La... 236 views
- China banking industry market resea... by Qianzhan Intellig... 149 views

860 views

Published on

Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.

Published in:
Engineering

No Downloads

Total views

860

On SlideShare

0

From Embeds

0

Number of Embeds

8

Shares

0

Downloads

15

Comments

0

Likes

1

No embeds

No notes for slide

Today // I’d like to talk about Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing.

For example, / speech and noise separation, / specific instrumental sound extraction like this, / and so on.

Typical method for sound signal separation is treated in the time-frequency domain, namely, in the spectrogram domain.

There are two tones in this spectrogram. So, if we could separate these tones like this, / the sound separation is achieved.

This is a sparse representation algorithm, and this method can extract the significant features from the observed matrix.

NMF decomposes the observed spectrogram Y, / into two nonnegative matrices F and G, approximately. (アポロークシメイトリ)

Here, first decomposed matrix F has frequently-appearing spectral patterns / as a basis.

And another decomposed matrix G has time-varying gains / of each spectral pattern.

So, the matrix F is called as ‘basis matrix,’ / and the matrix G is called as ‘activation matrix.’

Therefore, if we could know / which basis corresponds to the target signal, we can reconstruct the target spectrogram that has only the target sound.

However, it is very difficult / to cluster these bases as specific sources.

SNMF utilizes some sample sounds of the target signal / as a supervision signal.

For example, / if we wanted to separate the piano signal from this mixed signal, / the musical scale sound of the same piano / should be used as a supervision.

This sample sound is decomposed by simple NMF, / and the supervised basis matrix F is constructed in the training process.

Then, the mixed signal is decomposed in the separation process / using the supervised bases F, / as FG+HU.

The matrix F is fixed, / and the other matrix G, H, and U are optimized.

Finally, the target piano signal is separated as FG, / and the other signals, such as saxophone and bass, are separated as HU.

In SNMF, there is no constraint between the supervised matrix F and the other matrix H.

Therefore, the other bases H may also have the target spectral patterns.

For example, the target signal is represented as these basis and activation.

The supervised matrix F has this target basis / because this is a dictionary of the target signal.

If H also have this target basis, the activation is split / between G and U like this

So, the target signal is deprived by HU, / and the estimated signal loses some of the target signal.

This is because / the cost function is only defined as the distance between Y and FG+HU.

Even if the target components are split like this, the value of cost function doesn’t change.

And, lower left one is a spectrogram that have only the target signal.

As you can see,

If we separate this signal using SNMF,

To solve this problem, / we propose to make the other basis H as different as possible from the supervised basis F / by introducing a penalty term in the cost function.

We call this method as Penalized SNMF, PSNMF in short.

The cost function in the conventional SNMF is defined as the divergence between Y and FG+HU, like this equation, where Dβ indicates the generalized divergence function, / which includes Euclidian distance, Kullback-Leibler divergence, and Itakura-Saito divergence.

These equations, J1 and J2, are the cost functions in our proposed PSNMF.

I will explain these penalty terms in the following slides.

This is the optimization of H / that minimizes the inner product of supervised basis F and other basis H, like this.

If H includes the similar basis to F, this penalty term becomes larger.

So we can optimize H as different as possible from F by minimizing this term.

This minimization corresponds to the maximization of orthogonality (オゥサーゴナリティ) between F and H.

And all the bases are normalized as one / to avoid an arbitrariness of the scale.

In addition, we introduce a weighting parameter μ1.

This penalty is the optimization of H / that maximizes the divergence between F and H, / where we use the β-divergence in this penalty.

If H includes the similar basis to F, the value of divergence becomes smaller.

So we can optimize H as different as possible from F by maximizing this term.

Similarly, all the bases are normalized as one.

In addition, to treat this penalty as the minimization problem, / we invert the sign / and introduce an exponential function like this.

However, it is quite difficult to differentiate (ディファレンシエイト) these functions directly (ディレクトリィ), so we use an auxiliary (オゥグジーリアリ) function method.

This method is an optimization scheme that uses the upper bound function, / as the auxiliary function.

In this method, we design the auxiliary functions for the cost functions J1 and J2, / as J1+ and J2+.

Then we can minimize the original cost functions by minimizing the auxiliary functions indirectly.

To design the auxiliary function, we have to derive the upper bounds for Dβ and orthogonality (オゥサーゴナリティ) penalty term.

This divergence function is described like this.

The second and third terms become convex or concave function / with respect to β value.

For the convex function, Jensen’s inequality can use to derive the upper bound.

On the other hand, for the concave function, we can use the tangent line inequality.

The upper bound function JSNFM+ becomes quite complex form, so please refer to my paper.

This term always becomes the convex function, / so we can derive the upper bound using Jensen’s inequality as P+.

The update rules for optimization are obtained by these differentials.

This term corresponds to the orthogonality penalty.

Similarly, this term corresponds to the maximum-divergence penalty.

The instruments were clarinet, oboe, piano, and cello.

And, we used the same MIDI sounds of the target instruments / containing two octave notes, / as a supervision sound.

In addition, we evaluated two-source case and four-source case.

In the two-source case, the observed signal was produced by mixing two sources / selected from four instruments with the same power.

In the four-source case, we produced an observed signal / that consisted （コンシステッド） of four instruments with the same power.

There are 12 combinations in the two-source case, and 4 patterns in the four-source case.

The evaluation scores are averaged / in each case.

The divergence criterion β affects the separation accuracy.

So we used these values of β and βm, where βm is a criterion for the maximum-divergence penalty.

(These values correspond to Itakura-Saito divergence, Kullback-Leibler divergence, and Euclidian distance, respectively.)

(The number of supervised bases K was 100, and the number of the other bases was 50. )

We compare the conventional SNMF, / and our proposed PSNMF.

In addition, we used SDR value as the evaluation score.

SDR is the source-to-distortion ratio, / which indicates total quality of separated signal.

We indicated the results of β=0, 1, and 2, respectively.

The blue bar is the conventional SNMF, red one is the PSNMF with orthogonality penalty, and green one is the PSNMF with maximum-divergence penalty with various βm.

From this result, we can confirm that / the conventional SNMF cannot achieve high separation accuracy / because of the basis sharing problem.

But our proposed methods outperform conventional SNMF constantly.

Both of orthogonality and maximum-divergence can avoid the basis sharing problem.

(The total scores are decrease / because the number of interference signals increase in this case.)

(However )

PSNMF outperforms the conventional method.

The separated cello sound by conventional SNMF / lose some of the target components / because of the basis sharing problem.

But, our proposed PSNMF can separate with high performance.

Finally, I’ll show the sounds.

This is a mixed signal of cello and oboe sound.

Next one is only cello sound.

And, this is the separated cello sound by conventional SNMF.

And this is our proposed method.

Thank you for your attention.

That is, we cannot get the perfect supervision sound of the target signal.

Even if the supervision sounds are the same type of instrument as the target sound, / these sounds differ / according to various conditions.

For example, individual styles of playing / and the timbre individuality for each instrument, and so on.

When we want to separate this piano sound from mixed signal, / maybe we can only prepare the similar piano sound, but the timbre is slightly different.

However the supervised NMF cannot separate because of the difference of spectra of the target sound.

This is the decomposition model in this method.

We introduce the deformable term, / which has both positive and negative values like this.

Then we optimize the matrices D, G, H, and U.

This figure indicates spectral difference between the real sound and artificial sound.

So, the parameter λ controls the sensitivity of the divergence penalty term.

The cost function is defined as the divergence between observed spectrogram Y / and reconstructed spectrogram FG.

This minimization is an inequality constrained optimization problem.

教師無しNMFによる音源分離は，非常に困難な逆問題であり，頑健に動作する手法は未だ提案されていません．

教師ありNMF，SNMFは目的音の教師情報を用いるため，頑健に動作しますが，新たに「基底共有」という問題が生じます．これについて詳しく説明します．

- 1. Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano （Nara Institute of Science and Technology, Japan） Yu Takahashi, Kazunobu Kondo （Yamaha Corporation, Japan） IEEE International Symposium on Signal Processing and Information Technology December 12-15, 2013 - Athens, Greece Session T.B3: Speech – Audio - Music
- 2. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 2
- 3. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 3
- 4. • Sound signal separation – decomposes target source from an observed mixed signal. – Speech and noise, specific instrumental sound, etc. • Typical method for sound signal separation – is treated in the time-frequency domain. Background Extract! Time Frequency Spectrogram First tone Second tone Separation 4
- 5. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 5
- 6. • Nonnegative matrix factorization (NMF) – is a sparse representation algorithm. – can extract significant features from the observed matrix. • It is difficult to cluster the bases as specific sources. Nonnegative matrix factorization [Lee, et al., 2012] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 6 Basis
- 7. • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound. – Decompose into the target signal and other signal. Supervised NMF (SNMF) [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 7Fixed Ex. Musical scale Target signal Other signalMixed signal
- 8. Problem of SNMF • Basis sharing problem in SNMF – There is no constraint between and . – Other bases may also have the target spectral patterns. – The estimated target signal loses some of the target signal. – The cost function is only defined as the distance between 8 Estimated target signal Estimated other signals Target signal If also have the target basis… and .
- 9. Basis sharing problem: example of SNMF 9 Separated by SNMF Mixed signal Only the target signal (oracle)
- 10. Basis sharing problem: example of SNMF 10 Only the target signal (oracle) Separated by SNMF Mixed signal
- 11. Basis sharing problem: example of SNMF 11 Separated by SNMF Separated signal (estimated) The estimated signal loses some of the target components because of the basis sharing problem.
- 12. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 12
- 13. Proposed method • In SNMF, other basis matrix may have the same spectral patterns with supervised basis matrix . • Propose to make as different as possible from by introducing a penalty term in the cost function. 13 Target signal Other signalMixed signal Fixed Optimize as different as possible from . Basis sharing problem Penalized SNMF (PSNMF)
- 14. Decomposition model and cost function 14 Decomposition model: Cost function in SNMF: Generalized divergence function: -divergence [Eguchi, et al., 2001] Supervised basis matrix (fixed)
- 15. Decomposition model and cost function 15 Introduce a penalty term We propose two types of penalty terms. Cost function in PSNMF: Decomposition model: Cost function in SNMF: Supervised basis matrix (fixed)
- 16. Orthogonality penalty • Orthogonality penalty is the optimization of that minimizes the inner product of matrices and . – If includes the similar basis to , becomes larger. • All the bases are normalized as one. • Introduce a weighting parameter . 16
- 17. Maximum-divergence penalty • Maximum-divergence penalty is the optimization of – If includes the similar basis to , the divergence becomes smaller. • All the bases are normalized as one. • Introduce a weighting parameter and sensitivity parameter . 17 that maximizes the divergence between and .
- 18. Derivation of optimal variables in PSNMF • Derive the optimal variables . • Auxiliary function method – Optimization scheme that uses the upper bound function. – Design the auxiliary function for and as and . – Minimize the original cost functions by minimizing the auxiliary functions indirectly. 18
- 19. Derivation of optimal variables in PSNMF • The second and third terms become convex or concave function w.r.t. value. – Convex: Jensen’s inequality – Concave: tangent line inequality 19 where
- 20. Derivation of optimal variables in PSNMF • Always becomes the convex function – Convex: Jensen’s inequality 20 : auxiliary variable
- 21. Derivation of optimal variables in PSNMF • Auxiliary functions and are designed as • The update rules for optimization are obtained by 21 , and .
- 22. Update rules for optimization of PSNMF • Update rules with orthogonality penalty 22 where,
- 23. Update rules for optimization of PSNMF • Update rules with maximum-divergence penalty 23 where,
- 24. Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 24
- 25. • Produced four melodies using a MIDI synthesizer. • Used the same MIDI sounds of the target instruments containing two octave notes as a supervision sound. • Evaluation in two-source case and four-source case. – There are 12 combinations in the two-source case, and 4 patterns in the four-source case. Experimental conditions 25 Training sound Two octave notes that cover all the notes of the target signal.
- 26. • Evaluation scores [Vincent, 2006] – Source-to-distortion ratio (SDR) – SDR indicates the total quality of separated signal. Experimental conditions Observed signal Mixed 2 or 4 signals as the same power Training signal The same MIDI sounds of the target signal containing two octave notes Divergence criteria All combinations of Number of bases Supervised bases : 100 Other bases : 50 Parameters Experimentally determined Methods Conventional SNMF, Proposed PSNMF 26
- 27. 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] 0 2 4 6 8 10 12 14 16 SDR[dB] • Average scores of 12 combinations – Conventional SNMF cannot achieve high separation accuracy because of the basis sharing problem. – Proposed method outperforms conventional SNMF. Experimental results: two-source-case 27 Conv. SNMF PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF
- 28. • Average scores of 4 combinations – PSNMF outperforms the conventional method. 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] 0 2 4 6 8 10 12 14 SDR[dB] Experimental results: four-source-case 28 PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) PSNMF (Ortho.) PSNMF (Max.) 0 1 2 0 1 2 0 1 2 Conv. SNMF Conv. SNMF Conv. SNMF
- 29. Example of separation (Cello & Oboe) 29 Separated by SNMF Cello signal Mixed signal Separated by PSNMF (Ortho.)
- 30. Conclusions • Conventional supervised NMF has a basis sharing problem that degrades the separation performance. • We propose to add a penalty term, which forces the other bases to become uncorrelated with supervised bases, in the cost function. • Penalized supervised NMF can achieve the high separation accuracy. 30 Penalized supervised NMF Thank you for your attention!

No public clipboards found for this slide

Be the first to comment