Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization
1. 2014 RISP International Workshop on Nonlinear Circuits,
Communications and Signal Processing
Speech Analysis(2),2PM2-2
Online Divergence Switching for
Superresolution-Based
Nonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
Yu Takahashi, Kazunobu Kondo
(Yamaha Corporation, Japan)
Hirokazu Kameoka
(The University of Tokyo, Japan)
4. Research background
• Music signal separation technologies have received
much attention.
Applications
• Automatic music transcription
• 3D audio system, etc.
• Music signal separation based on nonnegative matrix
factorization (NMF) is a very active research area.
• The separation performance of supervised NMF
(SNMF) markedly degrades for the case of many
source mixtures.
We have been proposed a new hybrid
separation method for stereo music signals.
4
5. Research background
• Our proposed hybrid method
Input stereo signal
Spatial separation method
(Directional clustering)
SNMF-based separation method
(Superresolution-based SNMF)
Separated signal
5
6. Research background
• Optimal divergence criterion in superresolution-based
SNMF depends on the spatial conditions of the input
signal.
• Our aim in this presentation
We propose a new optimal separation scheme for this
hybrid method to separate the target signal with high
accuracy for any types of the spatial condition.
6
8. NMF [Lee, et al., 2001]
• NMF
– is a sparse representation algorithm.
– can extract significant features from the observed matrix.
Frequency
Amplitude
Basis matrix
Activation matrix
(spectral patterns) (Time-varying gain)
Frequency
Observed matrix
(spectrogram)
Time
Amplitude
Time
Basis
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
8
9. Optimization in NMF
• The variable matrices
and
are optimized by
minimization of the divergence between and
.
Cost function:
: Entries of variable matrices
and
, respectively.
• Euclidian distance (EUC-distance) and KullbuckLeibler divergence (KL-divergence) are often used
for the divergence in the cost function.
• In NMF-based separation, KL-divergence based cost
function achieves high separation performance.
9
10. SNMF [Smaragdis, et al., 2007]
• SNMF utilizes some sample sounds of the target.
– Construct the trained basis matrix of the target sound
– Decompose into the target signal and other signal
10
11. Problem of SNMF
• The separation performance of SNMF markedly
degrades when many interference sources exist.
11
12. Directional clustering [Araki, et al., 2007]
• Directional clustering
– utilizes differences between channels as a separation cue.
– Is equal to binary masking in the spectrogram domain.
Input signal (stereo)
Right
C
C
C
C
C
C
L
C
R
C
C
L
C
R
C
R
L
C
L
C
Time
L
R
L
R
R
L
C
R
R
R
L
C
Binary mask
Frequency
Spectrogram
Frequency
Left
Center
Separated signal
Entry-wise product
1
1
1
1
1
1
0
1
0
1
1
0
1
0
1
0
0
1
0
1
0
0
0
0
1
Center
0
0
0
0
1
Time
Binary masking
L
R
• Problems
– Cannot separate sources in the same direction
– Artificial distortion arises owing to the binary masking.
12
13. Hybrid method [D. Kitamura, et al., 2013]
• We have proposed a new SNMF called
superresolution-based SNMF and its hybrid method.
• Hybrid method consists of directional clustering and
superresolution-based SNMF.
13
14. Superresolution-based SNMF
• This SNMF reconstructs the spectrogram obtained
from directional clustering using supervised basis
extrapolation.
Time
Directional
clustering
: Chasms
Reconstructed
spectrogram
Frequency
Other
direction
Target
direction
Separated cluster
Frequency
Frequency
Input spectrogram
Time
Time
Superresolutionbased SNMF
14
15. Superresolution-based SNMF
• Spectral chasms owing to directional clustering
Frequency
Separated cluster
Chasms
: Chasm
Time
Supervised basis
Treat these chasms as
an unseen observations
Extrapolate the
fittest bases
…
15
16. Frequency of
source component
Superresolution-based SNMF
signal
Frequency of
source component
Left
Right
Center
Direction
(b) After
directional
clustering
z
Left
Frequency of
source component
Target
(a) Input
Center
Direction
Right
(c) After
superresolutionbased SNMF
Left
Extrapolated
components
Center
Direction
Right
16
17. Decomposition model and cost function
Decomposition model:
Supervised bases (Fixed)
Cost function:
Penalty term
Regularization term
: Index matrix obtained from directional clustering
: Entries of matrices,
: Binary complement,
, and
: Weighting parameters,
, respectively
: Frobenius norm
• The divergence is defined at all grids except for the
chasms by using the index matrix .
17
18. Update rules
• We can obtain the update rules for the optimization of
the variables matrices ,
, and .
Update rules:
18
20. Consideration for optimal divergence
• Separation performance of conventional
SNMF
KL-divergence
EUC-distance
However…
• Superresolution-based SNMF
KL-divergence
?
EUC-distance
– Optimal divergence depends on the amount of
spectral chasms.
20
21. Consideration for optimal divergence
• Superresolution-based SNMF has two tasks.
Superresolutionbased SNMF
Signal
separation
Basis
extrapolation
• Abilities of each divergence
KL-divergence
EUC-distance
Signal
separation
(Very good)
(Good)
Basis
extrapolation
(Poor)
(Good)
21
22. Consideration for optimal divergence
• Spectrum decomposed by NMF with KL-divergence
tends to become sparse compared with that
decomposed by NMF with EUC-distance.
• Sparse basis is not suitable for extrapolating using
observable data.
22
23. Consideration for optimal divergence
Performance
• The optimal divergence for superresolution-based
SNMF depends on the amount of spectral chasms
because of the trade-off between separation and
extrapolation abilities.
Total performance
Separation Extrapolation
KL-divergence
EUC-distance
Sparse
Sparseness:
Strong
Anti-sparse
Weak
23
24. Consideration for optimal divergence
• The optimal divergence for superresolution-based
SNMF depends on the amount of spectral chasms.
: Chasms
Time
If the chasms are not exist
Frequency
Frequency
If there are many chasms
: Chasms
Time
The extrapolation ability is
required.
The separation ability is
required.
EUC-distance should
be used.
KL-divergence should
be used.
24
25. Hybrid method for online input data
• When we consider applying the hybrid method to
online input data…
Binary
mask
Frequency
Directional clustering
Observed
spectrogram
Time
Online binary-masked spectrogram
25
26. Hybrid method for online input data
Frequency
• We divide the online spectrogram into some block
parts.
Time
In parallel
Superresolution- Superresolution- Superresolutionbased SNMF
based SNMF
based SNMF
26
27. Online divergence switching
• We calculate the rate of chasms in each block part.
Threshold
value
The chasms are
not exist so much.
Superresolutionbased SNMF with
KL-divergence
Threshold
value
There are many
chasms.
Superresolutionbased SNMF with
EUC-distance
27
30. Experimental conditions
• We used stereo-panning signals.
• Mixture of four instruments generated by MIDI synthesizer
• We used the same type of MIDI sounds of the target
instruments as supervision for training process.
Left
Center
2
4
1
Target source
Right
3
Supervision
sound
Two octave notes that cover all the notes of the target signal
30
31. Experimental conditions
• We compared three methods.
– Hybrid method using only EUC-distance-based SNMF
(Conventional method 1)
– Hybrid method using only KL-divergence-based SNMF
(Conventional method 2)
– Proposed hybrid method that switches the divergence to
the optimal one (Proposed method)
• We used signal-to-distortion ratio (SDR) as an
evaluation score.
– SDR indicates the total separation accuracy, which includes
both of quality of separated target signal and degree of
separation.
31
32. Experimental result
• Average SDR scores for each method, where the
four instruments are shuffled with 12 combinations.
Good
Bad
Conventional
method 1
Conventional
method 2
Proposed
method
8.0
8.5
9.0
9.5
SDR [dB]
10.0
• Proposed method outperforms other methods.
32
33. Conclusions
• We propose a new divergence switching scheme for
superresolution-based SNMF.
• This method is for the online input signal to separate
using optimal divergence in NMF.
• The proposed method can be used for any types of
the spatial condition of sources, and separates the
target signal with high accuracy.
Thank you for your attention!
33