Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization

2014 RISP International Workshop on Nonlinear Circuits,
Communications and Signal Processing
Speech Analysis(2),2PM2-2

Online Divergence Switching for
Superresolution-Based
Nonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)

Yu Takahashi, Kazunobu Kondo
(Yamaha Corporation, Japan)

Hirokazu Kameoka
(The University of Tokyo, Japan)

Outline
• 1. Research background
• 2. Conventional methods
–
–
–
–

Nonnegative matrix factorization
Supervised nonnegative matrix factorization
Directional clustering
Hybrid method

• 3. Proposed method
– Online divergence switching for hybrid method

• 4. Experiments
• 5. Conclusions

2

Outline
–
–
–
–

Hybrid method


• 4. Experiments
• 5. Conclusions

3

Research background
• Music signal separation technologies have received
much attention.
Applications
• Automatic music transcription
• 3D audio system, etc.

• Music signal separation based on nonnegative matrix
factorization (NMF) is a very active research area.
• The separation performance of supervised NMF
(SNMF) markedly degrades for the case of many
source mixtures.

We have been proposed a new hybrid
separation method for stereo music signals.
4

Research background
• Our proposed hybrid method
Input stereo signal
Spatial separation method
(Directional clustering)
SNMF-based separation method
(Superresolution-based SNMF)
Separated signal
5

Research background
• Optimal divergence criterion in superresolution-based
SNMF depends on the spatial conditions of the input
signal.

• Our aim in this presentation
We propose a new optimal separation scheme for this
hybrid method to separate the target signal with high
accuracy for any types of the spatial condition.
6

Outline
–
–
–
–

Hybrid method


• 4. Experiments
• 5. Conclusions

7

NMF [Lee, et al., 2001]
• NMF
– is a sparse representation algorithm.
– can extract significant features from the observed matrix.

Frequency

Amplitude

Basis matrix
Activation matrix
(spectral patterns) (Time-varying gain)

Frequency

Observed matrix
(spectrogram)

Time
Amplitude

Time

Basis

Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
8

Optimization in NMF
• The variable matrices
and
are optimized by
minimization of the divergence between and
.
Cost function:

: Entries of variable matrices

and

, respectively.

• Euclidian distance (EUC-distance) and KullbuckLeibler divergence (KL-divergence) are often used
for the divergence in the cost function.
• In NMF-based separation, KL-divergence based cost
function achieves high separation performance.
9

SNMF [Smaragdis, et al., 2007]
• SNMF utilizes some sample sounds of the target.
– Construct the trained basis matrix of the target sound
– Decompose into the target signal and other signal

10

Problem of SNMF
• The separation performance of SNMF markedly
degrades when many interference sources exist.

11

Directional clustering [Araki, et al., 2007]
• Directional clustering
– utilizes differences between channels as a separation cue.
– Is equal to binary masking in the spectrogram domain.
Input signal (stereo)
Right

C
C
C
C
C

C
L
C
R
C

C
L
C
R
C

R
L
C
L
C

Time

L

R

L
R
R
L
C

R
R
R
L
C

Binary mask
Frequency

Spectrogram
Frequency

Left

Center

Separated signal

Entry-wise product

1
1
1
1
1

1
0
1
0
1

1
0
1
0
1

0
0
1
0
1

0
0
0
0
1

Center

0
0
0
0
1

Time

Binary masking

L

R

• Problems
– Cannot separate sources in the same direction
– Artificial distortion arises owing to the binary masking.

12

Hybrid method [D. Kitamura, et al., 2013]
• We have proposed a new SNMF called
superresolution-based SNMF and its hybrid method.
• Hybrid method consists of directional clustering and
superresolution-based SNMF.

13

Superresolution-based SNMF
• This SNMF reconstructs the spectrogram obtained
from directional clustering using supervised basis
extrapolation.

Time

Directional
clustering

: Chasms

Reconstructed
spectrogram
Frequency

Other
direction
Target
direction

Separated cluster
Frequency

Frequency

Input spectrogram

Time

Time

Superresolutionbased SNMF
14

• Spectral chasms owing to directional clustering
Frequency

Separated cluster

Chasms

: Chasm

Time

Supervised basis

Treat these chasms as
an unseen observations

Extrapolate the
fittest bases

…
15

Frequency of
source component

signal

Frequency of
source component

Left

Right

Center
Direction

(b) After
directional
clustering

z

Left
Frequency of
source component

Target

(a) Input

Center
Direction

Right

(c) After
superresolutionbased SNMF

Left

Extrapolated
components

Center
Direction

Right
16

Decomposition model and cost function
Decomposition model:
Supervised bases (Fixed)

Cost function:

Penalty term
Regularization term
: Index matrix obtained from directional clustering
: Entries of matrices,
: Binary complement,

, and

: Weighting parameters,

, respectively
: Frobenius norm

• The divergence is defined at all grids except for the
chasms by using the index matrix .

17

Update rules
• We can obtain the update rules for the optimization of
the variables matrices ,
, and .
Update rules:

18

Outline
–
–
–
–

Hybrid method


• 4. Experiments
• 5. Conclusions

19

Consideration for optimal divergence
• Separation performance of conventional
SNMF
KL-divergence

EUC-distance

However…
• Superresolution-based SNMF
KL-divergence

?

EUC-distance

– Optimal divergence depends on the amount of
spectral chasms.
20

• Superresolution-based SNMF has two tasks.
Superresolutionbased SNMF

Signal
separation

Basis
extrapolation

• Abilities of each divergence

KL-divergence
EUC-distance

Signal
separation
(Very good)
(Good)

Basis
extrapolation
(Poor)
(Good)
21

• Spectrum decomposed by NMF with KL-divergence
tends to become sparse compared with that
decomposed by NMF with EUC-distance.

• Sparse basis is not suitable for extrapolating using
observable data.
22


Performance

• The optimal divergence for superresolution-based
SNMF depends on the amount of spectral chasms
because of the trade-off between separation and
extrapolation abilities.
Total performance

Separation Extrapolation

KL-divergence

EUC-distance

Sparse

Sparseness:

Strong

Anti-sparse

Weak

23

• The optimal divergence for superresolution-based
SNMF depends on the amount of spectral chasms.
: Chasms

Time

If the chasms are not exist
Frequency

Frequency

If there are many chasms

: Chasms

Time

The extrapolation ability is
required.

The separation ability is
required.

EUC-distance should
be used.

KL-divergence should
be used.

24

Hybrid method for online input data
• When we consider applying the hybrid method to
online input data…
Binary
mask

Frequency


Observed
spectrogram

Time
Online binary-masked spectrogram
25

Hybrid method for online input data

Frequency

• We divide the online spectrogram into some block
parts.

Time

In parallel
Superresolution- Superresolution- Superresolutionbased SNMF
based SNMF
based SNMF
26

Online divergence switching
• We calculate the rate of chasms in each block part.
Threshold
value

The chasms are
not exist so much.

Superresolutionbased SNMF with
KL-divergence

Threshold
value

There are many
chasms.

Superresolutionbased SNMF with
EUC-distance
27

Procedure of proposed method

28

Outline
–
–
–
–

Hybrid method


• 4. Experiments
• 5. Conclusions

29

Experimental conditions
• We used stereo-panning signals.
• Mixture of four instruments generated by MIDI synthesizer
• We used the same type of MIDI sounds of the target
instruments as supervision for training process.

Left

Center

２

４
１

Target source

Right

３

Supervision
sound

Two octave notes that cover all the notes of the target signal
30

Experimental conditions
• We compared three methods.
– Hybrid method using only EUC-distance-based SNMF
(Conventional method 1)
– Hybrid method using only KL-divergence-based SNMF
(Conventional method 2)
– Proposed hybrid method that switches the divergence to
the optimal one (Proposed method)

• We used signal-to-distortion ratio (SDR) as an
evaluation score.
– SDR indicates the total separation accuracy, which includes
both of quality of separated target signal and degree of
separation.
31

Experimental result
• Average SDR scores for each method, where the
four instruments are shuffled with 12 combinations.
Good

Bad
Conventional
method 1
Conventional
method 2
Proposed
method

8.0

8.5

9.0
9.5
SDR [dB]

10.0

• Proposed method outperforms other methods.
32

Conclusions
• We propose a new divergence switching scheme for
superresolution-based SNMF.
• This method is for the online input signal to separate
using optimal divergence in NMF.
• The proposed method can be used for any types of
the spatial condition of sources, and separates the
target signal with high accuracy.

Thank you for your attention!
33

Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization

Similar to Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization (20)

More from 奈良先端大情報科学研究科

More from 奈良先端大情報科学研究科 (20)

Recently uploaded

Recently uploaded (20)