Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Divergence optimization in nonnegative matrix
factorization with spectrogram restoration for
multichannel signal separation
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
Yu Takahashi, Kazunobu Kondo
(Yamaha Corporation, Japan)
Hirokazu Kameoka
(The University of Tokyo, Japan)
4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays
Oral session 2 – Microphone array processing

Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
2

Outline
– Hybrid method
• 4. Experiments
• 5. Conclusions
3

Research background
• Signal separation have received much attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) is a very active research area.
• Supervised NMF (SNMF) achieves the highest
separation performance.
• To improve its performance, SNMF-based
multichannel signal separation method is required.
4
• Automatic music transcription
• 3D audio system, etc.
Applications
Separate!
We have proposed a new SNMF and its hybrid
separation method for multichannel signals.

Research background
• Our proposed hybrid method
5
Input stereo signal
Spatial separation method
(Directional clustering)
SNMF-based separation method
(SNMF with spectrogram restoration)
Separated signal
L R

Research background
• Divergence criterion in SNMF strongly affects
separation performance.
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
• The optimal divergence for SNMF with spectrogram
restoration is not apparent.
6
We extend our new SNMF to a more generalized form.
We give a theoretical analysis for the optimization of
the divergence.

Outline
– NMF
– Supervised NMF
– Hybrid method
• 4. Experiments
• 5. Conclusions
7
Stereo signal
Spatial separation
Spectral separation
Separated signal
Hybrid method

Directional clustering [Araki, et al., 2007]
• Directional clustering
– Unsupervised spatial separation method
• Problems
– Cannot separate sources in the same direction
– Artificial distortion arises owing to the binary masking.
8
Right
L R
Center
Left
L R
Center
Binary masking
Input signal (stereo) Separated signal
1 1 1 0 0 0
1 0 0 0 0 0
1 1 1 1 0 0
1 0 0 0 0 0
1 1 1 1 1 1
Frequency
Time
C C C R L R
C L L L R R
C C C C R R
C R R L L L
C C C C C C
Frequency
Time
Binary maskSpectrogram
Entry-wise product

• NMF can extract significant spectral patterns.
– Basis matrix has frequently-appearing spectral patterns
in .
NMF [Lee, et al., 2001]
Amplitude
Amplitude
Observed matrix
(spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
Time
Frequency
Frequency
9
Basis

Divergence criterion in NMF
• Cost function in NMF
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
10
: Entries of variable matrices and , respectively.

• SNMF
– Supervised spectral separation method
Supervised NMF [Smaragdis, et al., 2007]
Separation process Optimize
Training process
Supervised basis matrix
(spectral dictionary)
Sample sounds
of target signal
11
Fixed
Sample sound
Target signal Other signalMixed signal

Hybrid method [Kitamura, et al., 2013]
• We have proposed a new SNMF called SNMF with
spectrogram restoration and its hybrid method.
12
Directional
clustering
L R
Spatial
separation
Spectral
separation
SNMF with
spectrogram restoration
Hybrid method

SNMF with spectrogram restoration
• SNMF with spectrogram restoration can separate the
target and restore the spectrogram simultaneously.
13
: Hole
Time
Frequency
Spectrogram after
directional clustering
Time
Frequency
After SNMF with
Non-target
Target
Non-target
Target
Supervised bases
(Dictionary of the target)

• The divergence is defined at all grids except for the
holes by using the Binary mask matrix .
Decomposition model and cost function
14
Decomposition model:
Supervised bases (Fixed)
: Entries of matrices, , and , respectively
: Weighting parameters,: Binary complement, : Frobenius norm
Regularization term
Penalty term
Cost function:
: Binary masking matrix obtained from directional clustering

Outline
– Hybrid method
• 4. Experiments
• 5. Conclusions
15

• : -divergence [Eguchi, et al., 2001]
– EUC-distance
– KL-divergence
– IS-divergence
Generalized divergence: b -divergence
16

• We introduced -divergence to extend the cost
function as a generalized form.
Decomposition model and cost function
17
Decomposition model:
Supervised bases (Fixed)Cost function:

Update rules
• We can obtain the update rules for the optimization of
the variables matrices , , and .
18
Update rules:

SNMF with spectrogram restoration
• This SNMF has two tasks.
• The optimal divergence for source separation has
been investigated.
– KL-divergence ( ) is suitable for source separation.
• No one investigates about the optimal divergence for
basis extrapolation.
• We analyze the optimal divergence for basis
extrapolation based on a generation model in NMF.
19
Source
separation
SNMF with
Basis
extrapolation

• The decomposition of NMF is equivalent to a
maximum likelihood estimation, which assumes the
generation model of the input data , implicitly.
Analysis of extrapolation ability
20
Cost function in NMF:
Exponential dist. Poisson dist. Gaussian dist.
: Maximum of data
IS-divergence KL-divergence EUC-distance

• To compare net extrapolation ability, we generate a
random data , which obey each generation model.
• Also, we prepare the binary-masked random
data , and attempt to restore that.
21
Restoration
100 bases is created.
Training

• Binary mask was randomly generated.
– We generate two types of binary mask whose densities of
holes are 75% and 98%.
• SAR indicates the accuracy of restoration
22
Input random data Binary-masked data Restored data
Binary
masking
Restoration
[dB]
Entry-wise square

Results of restoration analysis
• Simulated result of the restoration ability
• The optimal divergence for the basis extrapolation
(restoration) is around !
23
25
20
15
10
5
0
SAR[dB]
43210
bNMF
25
20
15
10
5
0
SAR[dB]
43210
bNMF
breg=0
breg=1
breg=2
breg=3
breg=0
breg=1
breg=2
breg=3
Optimal divergence for source separation (KL-divergence)
Good
Bad
75%-binary-masked 98%-binary-masked

Trade-off between separation and restoration
• The optimal divergence for SNMF with spectrogram
restoration and its hybrid method is based on the
trade-off between separation and restoration abilities.
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
Sparseness: strong Sparseness: weak
24
Performance
Separation
Total performance of the hybrid method
Restoration
0 1 2 3 4

Outline
– Hybrid method
• 4. Experiments
• 5. Conclusions
25

• Mixed signal includes four melodies (sources).
• Three compositions of instruments
– We evaluated the average score of 36 patterns.
Experimental condition
26
Center
１
２３
４
Left Right
Target source
Supervision
signal
24 notes that cover all the notes in the target melody
Dataset Melody 1 Melody 2 Midrange Bass
No. 1 Oboe Flute Piano Trombone
No. 2 Trumpet Violin Harpsichord Fagotto
No. 3 Horn Clarinet Piano Cello

14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Signal-to-distortion ratio (SDR)
– total quality of the separation, which includes the degree of
separation and absence of artificial distortion.
Experimental result
27
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Unsupervised
method
Supervised method
Multichannel NMF is an integrated method.

Experiment for real-recorded signal
• We recorded a binaural signal using dummy head
• Reverberation time:
– 200 ms
• The other conditions
are the same as
those in the previous
instantaneous mixture
signal.
28
1
Center
Right
4
2 3
Left
Dummy head
1.5 m 1.5 m
1.5 m
2.5 m
Target signal

14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Result for real-recorded signals
Experimental result
29
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Unsupervised
method
Supervised method
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Multichannel NMF is an integrated method.

Conclusions
• Restoration requires anti-sparse criterion ( b = 3 )
• There is a trade-off between separation and
restoration abilities
• Optimal divergence is EUC-distance for SNMF
with spectrogram restoration
– whereas KL-divergence is the best for conventional
SNMF.
30
Thank you for your attention!

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Similar to Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation (20)

More from Daichi Kitamura

More from Daichi Kitamura (20)

Recently uploaded

Recently uploaded (20)

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Editor's Notes