The document proposes a new supervised nonnegative matrix factorization (SNMF) method and hybrid method for multichannel signal separation. It analyzes the optimal divergence criterion for the SNMF with spectrogram restoration ability. The key points are:
1. A generalized cost function is introduced to extend SNMF to optimize the divergence criterion.
2. Theoretical analysis based on a data generation model finds the optimal divergence for basis extrapolation in spectrogram restoration is around Euclidean distance.
3. Experiments show the proposed hybrid method using Euclidean distance outperforms other methods for both instantaneous mixtures and real recordings, achieving the best separation quality measured by signal-to-distortion ratio.
Application of Residue Theorem to evaluate real integrations.pptx
Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation
1. Divergence optimization in nonnegative matrix
factorization with spectrogram restoration for
multichannel signal separation
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
Yu Takahashi, Kazunobu Kondo
(Yamaha Corporation, Japan)
Hirokazu Kameoka
(The University of Tokyo, Japan)
4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays
Oral session 2 – Microphone array processing
2. Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
2
3. Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
3
4. Research background
• Signal separation have received much attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) is a very active research area.
• Supervised NMF (SNMF) achieves the highest
separation performance.
• To improve its performance, SNMF-based
multichannel signal separation method is required.
4
• Automatic music transcription
• 3D audio system, etc.
Applications
Separate!
We have proposed a new SNMF and its hybrid
separation method for multichannel signals.
5. Research background
• Our proposed hybrid method
5
Input stereo signal
Spatial separation method
(Directional clustering)
SNMF-based separation method
(SNMF with spectrogram restoration)
Separated signal
L R
6. Research background
• Divergence criterion in SNMF strongly affects
separation performance.
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
• The optimal divergence for SNMF with spectrogram
restoration is not apparent.
6
We extend our new SNMF to a more generalized form.
We give a theoretical analysis for the optimization of
the divergence.
7. Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– NMF
– Supervised NMF
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
7
Stereo signal
Spatial separation
Spectral separation
Separated signal
Hybrid method
8. Directional clustering [Araki, et al., 2007]
• Directional clustering
– Unsupervised spatial separation method
• Problems
– Cannot separate sources in the same direction
– Artificial distortion arises owing to the binary masking.
8
Right
L R
Center
Left
L R
Center
Binary masking
Input signal (stereo) Separated signal
1 1 1 0 0 0
1 0 0 0 0 0
1 1 1 1 0 0
1 0 0 0 0 0
1 1 1 1 1 1
Frequency
Time
C C C R L R
C L L L R R
C C C C R R
C R R L L L
C C C C C C
Frequency
Time
Binary maskSpectrogram
Entry-wise product
9. • NMF can extract significant spectral patterns.
– Basis matrix has frequently-appearing spectral patterns
in .
NMF [Lee, et al., 2001]
Amplitude
Amplitude
Observed matrix
(spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
Time
Frequency
Frequency
9
Basis
10. Divergence criterion in NMF
• Cost function in NMF
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
10
: Entries of variable matrices and , respectively.
11. • SNMF
– Supervised spectral separation method
Supervised NMF [Smaragdis, et al., 2007]
Separation process Optimize
Training process
Supervised basis matrix
(spectral dictionary)
Sample sounds
of target signal
11
Fixed
Sample sound
Target signal Other signalMixed signal
12. Hybrid method [Kitamura, et al., 2013]
• We have proposed a new SNMF called SNMF with
spectrogram restoration and its hybrid method.
12
Directional
clustering
L R
Spatial
separation
Spectral
separation
SNMF with
spectrogram restoration
Hybrid method
13. SNMF with spectrogram restoration
• SNMF with spectrogram restoration can separate the
target and restore the spectrogram simultaneously.
13
: Hole
Time
Frequency
Spectrogram after
directional clustering
Time
Frequency
After SNMF with
spectrogram restoration
Non-target
Target
Non-target
Target
Supervised bases
(Dictionary of the target)
14. • The divergence is defined at all grids except for the
holes by using the Binary mask matrix .
Decomposition model and cost function
14
Decomposition model:
Supervised bases (Fixed)
: Entries of matrices, , and , respectively
: Weighting parameters,: Binary complement, : Frobenius norm
Regularization term
Penalty term
Cost function:
: Binary masking matrix obtained from directional clustering
15. Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
15
16. • : -divergence [Eguchi, et al., 2001]
– EUC-distance
– KL-divergence
– IS-divergence
Generalized divergence: b -divergence
16
17. • We introduced -divergence to extend the cost
function as a generalized form.
Decomposition model and cost function
17
Decomposition model:
Supervised bases (Fixed)Cost function:
18. Update rules
• We can obtain the update rules for the optimization of
the variables matrices , , and .
18
Update rules:
19. SNMF with spectrogram restoration
• This SNMF has two tasks.
• The optimal divergence for source separation has
been investigated.
– KL-divergence ( ) is suitable for source separation.
• No one investigates about the optimal divergence for
basis extrapolation.
• We analyze the optimal divergence for basis
extrapolation based on a generation model in NMF.
19
Source
separation
SNMF with
spectrogram restoration
Basis
extrapolation
20. • The decomposition of NMF is equivalent to a
maximum likelihood estimation, which assumes the
generation model of the input data , implicitly.
Analysis of extrapolation ability
20
Cost function in NMF:
Exponential dist. Poisson dist. Gaussian dist.
: Maximum of data
IS-divergence KL-divergence EUC-distance
21. • To compare net extrapolation ability, we generate a
random data , which obey each generation model.
• Also, we prepare the binary-masked random
data , and attempt to restore that.
Analysis of extrapolation ability
21
Restoration
100 bases is created.
Training
22. • Binary mask was randomly generated.
– We generate two types of binary mask whose densities of
holes are 75% and 98%.
• SAR indicates the accuracy of restoration
Analysis of extrapolation ability
22
Input random data Binary-masked data Restored data
Binary
masking
Restoration
[dB]
Entry-wise square
23. Results of restoration analysis
• Simulated result of the restoration ability
• The optimal divergence for the basis extrapolation
(restoration) is around !
23
25
20
15
10
5
0
SAR[dB]
43210
bNMF
25
20
15
10
5
0
SAR[dB]
43210
bNMF
breg=0
breg=1
breg=2
breg=3
breg=0
breg=1
breg=2
breg=3
Optimal divergence for source separation (KL-divergence)
Good
Bad
75%-binary-masked 98%-binary-masked
24. Trade-off between separation and restoration
• The optimal divergence for SNMF with spectrogram
restoration and its hybrid method is based on the
trade-off between separation and restoration abilities.
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
Sparseness: strong Sparseness: weak
24
Performance
Separation
Total performance of the hybrid method
Restoration
0 1 2 3 4
25. Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
25
26. • Mixed signal includes four melodies (sources).
• Three compositions of instruments
– We evaluated the average score of 36 patterns.
Experimental condition
26
Center
1
2 3
4
Left Right
Target source
Supervision
signal
24 notes that cover all the notes in the target melody
Dataset Melody 1 Melody 2 Midrange Bass
No. 1 Oboe Flute Piano Trombone
No. 2 Trumpet Violin Harpsichord Fagotto
No. 3 Horn Clarinet Piano Cello
27. 14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Signal-to-distortion ratio (SDR)
– total quality of the separation, which includes the degree of
separation and absence of artificial distortion.
Experimental result
27
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Unsupervised
method
Supervised method
Multichannel NMF is an integrated method.
28. Experiment for real-recorded signal
• We recorded a binaural signal using dummy head
• Reverberation time:
– 200 ms
• The other conditions
are the same as
those in the previous
instantaneous mixture
signal.
28
1
Center
Right
4
2 3
Left
Dummy head
1.5 m 1.5 m
1.5 m
2.5 m
Target signal
29. 14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Result for real-recorded signals
Experimental result
29
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Unsupervised
method
Supervised method
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Multichannel NMF is an integrated method.
30. Conclusions
• Restoration requires anti-sparse criterion ( b = 3 )
• There is a trade-off between separation and
restoration abilities
• Optimal divergence is EUC-distance for SNMF
with spectrogram restoration
– whereas KL-divergence is the best for conventional
SNMF.
30
Thank you for your attention!
Editor's Notes
This is outline of my talk.
First, // I talk about research background.
Recently, // signal separation technologies have received much attention.
These technologies are available for many applications, such as an automatic transcription, 3D audio system, and so on. /
Music signal separation / based on nonnegative matrix factorization, // NMF in short, // has been a very active area of the research.
Particularly, supervised NMF (SNMF) / achieves the highest separation performance.
However, SNMF can be used for only single-channel signal.
If we could use the multichannel information, we can improve the performance more than ever.
To improve its performance, SNMF-based multichannel signal separation method is required.
Our proposed hybrid method concatenates spatial separation method called directional clustering / and SNMF based separation method called SNMF with spectrogram restoration.
In this hybrid method, first, the target direction is separated by the directional clustering.
Then, target signal is separated by this SNMF.
In previous studies, / we confirmed that / the divergence criterion in SNMF / strongly affects separation performance.
For the source separation, KL-divergence criterion is often used and achieves the highest separation performance.
However, the optimal divergence for our new SNMF with spectrogram restoration is not apparent.
Therefore, in this presentation, we extend this method to a more generalized form.
In addition, we will give a theoretical analysis for the optimization of the divergence / to achieve the highest separation performance.
Next, // I talk about conventional methods.
Directional clustering is an unsupervised spatial separation method.
This method utilizes differences between left and right channels as a clustering cue.
So, we can separate the sources direction-wisely.
And this is equal to binary masking in the spectrogram domain.
So, we can obtain the binary mask from the result of clustering, and we have an entry-wise product.
Then we can obtain the separated signal.
However, this method cannot separate the sources in the same direction / like this.
In addition, the separated signal has an artificial distortion owing to the binary masking.
Next method is NMF.
NMF is a powerful method for extracting significant features from a spectrogram.
NMF decomposes the input spectrogram Y into a product of basis matrix F and activation matrix G,
where basis matrix F / has frequently-appearing spectral patterns / as basis vectors like this,
and activation matrix G / has time-varying gains / of each basis vector.
In NMF decomposition, the cost function is defined as a distance or a divergence between input matrix Y and decomposed matrix FG.
This equation indicates the cost function in NMF, and we minimize this to find F and G.
These 3 criteria / are often used in NMF decomposition, and KL-divergence is the best one for the acoustic signal separation.
To separate the target signal using NMF, Supervised NMF has been proposed.
SNMF is a supervised spectral separation method.
In SNMF, first, we train the sample sound of the target signal, which is like a musical scale.
Then we construct the supervised basis F. This is a spectral dictionary of the target sound.
Next, we separate the mixed signal / using the supervised basis F, as FG+HU.
Therefore, the target signal obtained as FG, and the other signal is reconstructed by HU.
This method can separate the target signal well, but this method can be used for only the signal-channel signal.
To apply the SNMF-based method to multichannel signal, / we have proposed a new SNMF called “SNMF with spectrogram restoration” / and its hybrid method.
In this hybrid method, / first, / directional clustering is applied to the input stereo signal / to separate the target direction.
Then, / the target signal is separated by this new SNMF.
Here, / the separated spectrogram by directional clustering / has many spectral holes / like this. This is due to the binary masking in directional clustering.
But, our new SNMF can restore such damaged spectrogram / by using a spectral dictionary of the target sound, namely, this SNMF can extrapolate the supervised basis F.
Simultaneously, the non-target signal is separated.
This is a decomposition model of SNMF with spectrogram restoration.
And, this equation is the cost function.
In this cost function, / the divergence is defined at all spectrogram grids / except for the spectral holes / by using the binary mask I.
For the grids of the holes, we impose a regularization term to avoid the extrapolation error.
In previous studies, we used EUC-distance and KL-divergence in this cost function.
In this presentation, we introduce a generalized divergence to this and extend this method.
Next, I talk about Analysis of restoration ability
In the extension, we introduce a generalized divergence function called beta-divergence.
This function has a parameter beta, and includes EUC-distance, KL-divergence, and IS-divergence when beta equals 2, 1, and 0 respectively.
By using beta-divergence, we can extend the cost function to more generalized form.
This cost function includes EUC-distance, KL-divergence and so on.
From the minimization of the cost function, / we can obtain the update rules / for the optimization of variable matrices G, H, and U.
This SNMF has two tasks, namely, Separation of the target signal / and basis extrapolation for the restoration of the spectrogram,
where the optimal divergence for source separation has been investigated by many researchers.
And it is clarified that the KL-divergence is suitable for source separation.
But nobody investigates about the optimal divergence for the basis extrapolation.
So, we analyze the optimal one / based on a generation model in NMF.
The decomposition of NMF is equivalent to a maximum likelihood estimation, / which assumes the generation model of the input data Y, implicitly.
If we select the parameter beta, / the assumption of generation model is fixed.
In other words, the parameter beta defines the generation model of the input data.
In this analysis, to compare the net extrapolation ability, we generated a random input data Y, which obey each generation model.
Also, we prepared the binary-masked random data YI, and attempt to restore that.
In a training process, we construct the supervised basis F using the random data Y.
Then we attempt to restore the binary-masked data using the trained basis F.
The binary mask I was generated by uniform manner, and we generated two types of binary masks / whose densities of holes are 75% and 98%.
Therefore, by calculating the similarity between input data Y and restored data FG, / we can evaluate the extrapolation ability and the accuracy of restoration.
So SAR indicates the accuracy of restoration.
These are the results of analysis.
The left one is the result for 75%-binary-masked data, and the right one is 98%-binary masked data.
Beta equals 1 is the optimal divergence for source separation, which means KL-divergence.
But, surprisingly, the optimal divergence for the restoration is that / beta equals around 3.
Therefore, the optimal divergence for the hybrid method is around EUC-distance / because of the trade-off between separation and restoration abilities / like this figure.
This is because, the sparse basis is not suitable for the extrapolation using only the observable data.
Next, I talk about Experiments.
This is an experimental condition.
The mixed signal includes four melodies.
Each sound source located like this figure.
The target source is always located in the center direction / with other interfering source.
And we prepared 3 compositions of instruments and evaluated the average score of 36 patterns.
In addition, the supervision signal has 24 notes like this score, which cover all the notes in the target melody.
This is a result of experiment.
We showed the average SDR score, where SDR indicates the total quality of the separation.
Directional clustering cannot separate the sources in the same direction, so the result was not good.
Multichannel NMF is an integrated method proposed by Sawada.
This method utilizes an integrated cost function, which includes spatial and spectral separations simultaneously.
But this method is quite difficult optimization problem because many variables should be optimized by using only one cost function.
So, this method strongly depends on the initial value, and the average score becomes bad.
The conventional SNMF achieves the highest score when beta equals 1, KL-divergence.
But, the optimal divergence of our hybrid method was 2 because of the trade-off between separation and restoration abilities.
Also we conducted an experiment using real-recorded signals.
In this experiment, the binaural mixed signal was recorded in the real environment.
The other conditions are the same as those in the previous experiment.
This is a result of the experiment using real-recorded signal.
From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
This is conclusions of my talk.
Thank you for your attention.
Supervised method has an inherent problem.
That is, we cannot get the perfect supervision sound of the target signal.
Even if the supervision sounds are the same type of instrument as the target sound, / these sounds differ / according to various conditions.
For example, individual styles of playing / and the timbre individuality for each instrument, and so on.
When we want to separate this piano sound from mixed signal, / maybe we can only prepare the similar piano sound, but the timbre is slightly different.
However the supervised NMF cannot separate because of the difference of spectra of the target sound.
To solve this problem, we have proposed a new supervised method / that adapts the supervised bases to the target spectra / by a basis deformation.
This is the decomposition model in this method.
We introduce the deformable term, / which has both positive and negative values like this.
Then we optimize the matrices D, G, H, and U.
This figure indicates spectral difference between the real sound and artificial sound.
This figure shows the directional distribution of the input stereo signal.
The target source is in the center direction, and other interfering sources are distributed like this.
After directional clustering, / left and right source components / leak in the center cluster, // and center sources lose some of their components.
These lost components / correspond to the spectral chasms in the spectrogram domain.
And after SNMF with spectrogram restoration, the target components are separated / and restored using supervised bases of the target sound trained in advance.
In other words, / the resolution of the target spectrogram / is recovered with the superresolution / by the supervised basis extrapolation.
As another means of addressing multichannel signal separation, Multichannel NMF also has been proposed by Ozerov and Sawada.
This method is a natural extension of NMF, and uses spectral and spatial cues.
But, this unified method is very difficult optimization problem mathematically / because many variables should be optimized by one cost function.
So, this method strongly depends on the initial values.
This SNMF is for a single-channel signal. Therefore we cannot use the information about correlation between channels.
However, almost all music signals are the stereo format. So we should extend SNMF for a multichannel signal.
In addition, when many interfering sources exist, the separation performance of SNMF markedly degrades.
This is because Many spectral patterns arise / with similar to the target sound.
Nonnegative matrix factorization is a very powerful and useful method / for extracting significant features in the input matrix.
NMF decomposes the input nonnegative matrix Y / into two matrices F and G like this, // where F and G cannot have the negative entries.
Therefore, all the entries in Y, F, and G / are nonnegative.
In addition, K is usually set smaller value than Ω and T, / so this is a kind of low-rank approximation.
This nonnegative constraint and dimensional reduction result that / the basis matrix has distinctive components in the observed matrix.
The optimization of variables F and G in NMF / is based on the minimization of the cost function.
The cost function is defined as the divergence between observed spectrogram Y / and reconstructed spectrogram FG.
This minimization is an inequality constrained optimization problem.
This is a result of the experiment using real-recorded signal.
From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
This spectrum is obtained by directional clustering. There are many spectral chasms owing to the binary masking.
SNMF with spectrogram restoration / treats these chasms as an unseen observations like this, / and extrapolates the fittest target basis / from the supervised bases F.
As a result, the lost components are restored by the supervised basis extrapolation.
SDR is the total evaluation score as the performance of separation.