Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

Robust Music Signal Separation Based on
Supervised Nonnegative Matrix Factorization
with Prevention of Basis Sharing
Daichi Kitamura, Hiroshi Saruwatari,
Kosuke Yagi, Kiyohiro Shikano
（Nara Institute of Science and Technology, Japan）
Yu Takahashi, Kazunobu Kondo
（Yamaha Corporation, Japan）
IEEE International Symposium on Signal Processing and Information Technology
December 12-15, 2013 - Athens, Greece
Session T.B3: Speech – Audio - Music

Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Problem of conventional method: basis sharing
• 3. Proposed method
– Penalized supervised nonnegative matrix factorization
• Orthogonality penalty
• Maximum-divergence penalty
• 4. Experiments
– Two-source case
– Four-source case
• 5. Conclusions 2

Outline
• 4. Experiments
– Two-source case

• Sound signal separation
– decomposes target source from an observed mixed signal.
– Speech and noise, specific instrumental sound, etc.
• Typical method for sound signal separation
– is treated in the time-frequency domain.
Background
Extract!
Time
Frequency
Spectrogram
First tone
Second tone
Separation
4

Outline
• 4. Experiments
– Two-source case

• Nonnegative matrix factorization (NMF)
– is a sparse representation algorithm.
– can extract significant features from the observed matrix.
• It is difficult to cluster the bases as specific sources.
Nonnegative matrix factorization [Lee, et al., 2012]
Amplitude
Amplitude
Observed matrix
(spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
Time
Frequency
Frequency
6
Basis

• SNMF utilizes some sample sounds of the target.
– Construct the trained basis matrix of the target sound.
– Decompose into the target signal and other signal.
Supervised NMF (SNMF) [Smaragdis, et al., 2007]
Separation process Optimize
Training process
Supervised basis matrix
(spectral dictionary)
Sample sounds
of target signal
7Fixed
Ex. Musical scale
Target signal Other signalMixed signal

Problem of SNMF
• Basis sharing problem in SNMF
– There is no constraint between and .
– Other bases may also have the target spectral patterns.
– The estimated target signal loses some of the target signal.
– The cost function is only defined as the distance between
8
Estimated
target signal
Estimated
other signals
Target signal
If also have the target basis…
and .

Basis sharing problem: example of SNMF
9
Separated
by SNMF
Mixed signal
Only the target
signal (oracle)

10
Only the target
signal (oracle)
Separated
by SNMF
Mixed signal

11
Separated
by SNMF
Separated signal
(estimated)
The estimated signal loses
some of the target components
because of the basis sharing
problem.

Outline
• 4. Experiments
– Two-source case

Proposed method
• In SNMF, other basis matrix may have the same
spectral patterns with supervised basis matrix .
• Propose to make as different as possible from
by introducing a penalty term in the cost function.
13
Target signal Other signalMixed signal Fixed
Optimize as different as possible from .
Basis sharing problem
Penalized SNMF (PSNMF)

Decomposition model and cost function
14
Decomposition model:
Cost function in SNMF:
Generalized divergence function: -divergence [Eguchi, et al., 2001]
Supervised basis matrix (fixed)

Decomposition model and cost function
15
Introduce a penalty term
We propose two types of penalty terms.
Cost function in PSNMF:
Decomposition model:
Cost function in SNMF:
Supervised basis matrix (fixed)

Orthogonality penalty
• Orthogonality penalty is the optimization of that
minimizes the inner product of matrices and .
– If includes the similar basis to , becomes
larger.
• All the bases are normalized as one.
• Introduce a weighting parameter .
16

Maximum-divergence penalty
• Maximum-divergence penalty is the optimization of
– If includes the similar basis to , the divergence
becomes smaller.
• All the bases are normalized as one.
• Introduce a weighting parameter and sensitivity
parameter .
17
that maximizes the divergence between and .

Derivation of optimal variables in PSNMF
• Derive the optimal variables .
• Auxiliary function method
– Optimization scheme that uses the upper bound function.
– Design the auxiliary function for and as and .
– Minimize the original cost functions by minimizing the
auxiliary functions indirectly.
18

• The second and third terms become convex or
concave function w.r.t. value.
– Convex: Jensen’s inequality
– Concave: tangent line inequality
19
where

• Always becomes the convex function
– Convex: Jensen’s inequality
20
: auxiliary variable

• Auxiliary functions and are designed as
• The update rules for optimization are obtained by
21
, and .

Update rules for optimization of PSNMF
• Update rules with orthogonality penalty
22
where,

Update rules for optimization of PSNMF
• Update rules with maximum-divergence penalty
23
where,

Outline
• 4. Experiments
– Two-source case

• Produced four melodies using a MIDI synthesizer.
• Used the same MIDI sounds of the target instruments
containing two octave notes as a supervision sound.
• Evaluation in two-source case and four-source case.
– There are 12 combinations in the two-source case, and 4
patterns in the four-source case.
Experimental conditions
25
Training sound
Two octave notes that cover all the notes of the target signal.

• Evaluation scores [Vincent, 2006]
– Source-to-distortion ratio (SDR)
– SDR indicates the total quality of separated signal.
Experimental conditions
Observed signal Mixed 2 or 4 signals as the same power
Training signal
The same MIDI sounds of the target signal
containing two octave notes
Divergence
criteria
All combinations of
Number of bases
Supervised bases : 100
Other bases : 50
Parameters
Experimentally determined
Methods Conventional SNMF, Proposed PSNMF
26

0
2
4
6
8
10
12
14
16
SDR[dB]
0
2
4
6
8
10
12
14
16
SDR[dB]
0
2
4
6
8
10
12
14
16
SDR[dB]
• Average scores of 12 combinations
– Conventional SNMF cannot achieve high separation
accuracy because of the basis sharing problem.
– Proposed method outperforms conventional SNMF.
Experimental results: two-source-case
27
Conv.
SNMF
PSNMF
(Ortho.)
PSNMF
(Max.)
PSNMF
(Ortho.)
PSNMF
(Max.) PSNMF
(Ortho.)
PSNMF
(Max.)
0 1 2 0 1 2 0 1 2
Conv.
SNMF
Conv.
SNMF

• Average scores of 4 combinations
– PSNMF outperforms the conventional method.
0
2
4
6
8
10
12
14
SDR[dB]
0
2
4
6
8
10
12
14
SDR[dB]
0
2
4
6
8
10
12
14
SDR[dB]
Experimental results: four-source-case
28
PSNMF
(Ortho.)
PSNMF
(Max.)
PSNMF
(Ortho.)
PSNMF
(Max.) PSNMF
(Ortho.)
PSNMF
(Max.)
0 1 2 0 1 2 0 1 2
Conv.
SNMF
Conv.
SNMF
Conv.
SNMF

Example of separation (Cello & Oboe)
29
Separated
by SNMF
Cello signal
Mixed signal
Separated
by PSNMF
(Ortho.)

Conclusions
• Conventional supervised NMF has a basis sharing
problem that degrades the separation performance.
• We propose to add a penalty term, which forces the
other bases to become uncorrelated with supervised
bases, in the cost function.
• Penalized supervised NMF can achieve the high
separation accuracy.
30
Penalized supervised NMF
Thank you for your attention!

Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

Similar to Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing (18)

More from Daichi Kitamura

More from Daichi Kitamura (20)

Recently uploaded

Recently uploaded (20)

Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing

Editor's Notes