Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized Superresolution-Based
Binaural Signal Separation
with Nonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari,
Yusuke Iwao, Kiyohiro Shikano
(Nara Institute of Science and Technology, Nara, Japan)
Kazunobu Kondo, Yu Takahashi
(Yamaha Corporation Research & Development Center, Shizuoka, Japan)

Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
2

Outline
– Hybrid method
factorization
• 4. Experiments
• 5. Conclusions
3

Background
• Music signal separation technologies have received much
attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) has been a very active area of the
research.
• The extraction performance of NMF markedly degrades for the
case of many source mixtures.
4
• Automatic music transcription
• 3D audio system, etc.
Applications
We propose a new method for multichannel signal
separation with NMF utilizing both spectral and spatial
cues included in mixtures of multiple instruments.

Outline
– Hybrid method
factorization
• 4. Experiments
• 5. Conclusions
5

NMF
• NMF is a type of sparse representation algorithm that
decomposes a nonnegative matrix into two nonnegative
matrices. [D. D. Lee, et al., 2001]
6
Time
Frequency
AmplitudeFrequency
Amplitude
Observed matrix
(Spectrogram)
Basis matrix
(Spectral bases)
Activation matrix
(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of frames
𝐾: Number of bases
𝒀: Observed matrix
𝑭: Basis matrix
𝑮: Activation matrix

Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
7
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound

Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
8
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound
Problem of PSNMF: When the signal includes many sources,
the extraction performance markedly degrades.

Directional Clustering
• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
9
L R
L-chinputsignal
R-ch input signal
：Source component
：Centroid vector

• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
10
L R
L-chinputsignal
R-ch input signal
：Source component
：Centroid vector
Problem of directional clustering:
This method cannot separate sources in the same direction.

Hybrid method
• Conventional hybrid method utilizes PSNMF after the
directional clustering. [Iwao, et al., 2012]
• This method consists of two techniques.
– PSNMF
11
Directional
clustering
L R PSNMF
Spatial
separation
Source
separation
Conventional Hybrid method

Problem of hybrid method
• The signal extracted by the hybrid method suffers from the
generation of considerable distortion due to the binary
masking in directional clustering.
• The signal in the target direction, which is obtained by
directional clustering, has many spectral chasms.
• The resolution of the spectrogram is degraded.
12
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Frequency
: Target direction Time
Frequency
TimeFrequency
: Other direction ：Hadamard product (product of each element)
Input spectrogram Binary mask Separated cluster

Outline
– Hybrid method
factorization
• 4. Experiments
• 5. Conclusions
13

Proposed hybrid method
14
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-ch
center cluster
Index of
based SNMF
Superresolution-
based SNMF
Superresolution-
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Conventional
hybrid method
Proposed
hybrid method
Employ a new supervised NMF algorithm as an alternative
to the conventional PSNMF in the hybrid method.

Regularized superresolution-based NMF
• In proposed supervised NMF, the spectral chasms are treated
as unseen observations using index matrix.
15
: Chasms
Time
Frequency
Separated cluster
Chasms
Treat chasms as
unseen observations.
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Frequency
Index matrix

• The spectrogram of the target sound is reconstructed using
more matched bases because chasms are treated as unseen.
• The components of the target sound lost after directional
clustering can be extrapolated using supervised bases.
16
Time
Frequency
Separated cluster
Time
Frequency
Reconstructed spectrogram
: Chasms
Supervised
bases
Superresolution
using supervised
bases

17
• Signal flow of the proposed hybrid method
Center RightLeft
Direction
sourcecomponent
(a)
Frequencyof
Observed
spectra
Target source

18
Target direction
Center RightLeft
Direction
sourcecomponent
z
(b)
Frequencyof
After
directional
clustering
Target source
Center RightLeft
Direction
sourcecomponent
(a)
Frequencyof
Observed
spectra
Center sources lose some
of their components
Directional
clustering

19
Center RightLeft
Direction
sourcecomponent
z
(b)
Frequencyof
After
directional
clustering Center sources lose some
of their components

20
Center RightLeft
Direction
sourcecomponent
z
(b)
Frequencyof
After
directional
clustering Center sources lose some
of their components
Superresolution-
based NMF
Center RightLeft
Direction
sourcecomponent
(c)
Frequencyof
After
super-
resolution-
based SNMF
Extrapolated
target source

• The basis extrapolation includes an underlying problem.
• If the time-frequency spectra are almost unseen in the
spectrogram, which means that the indexes are almost zero, a
large extrapolation error may occur.
• It is necessary to regularize the extrapolation.
21
4
3
2
1
0
Frequency[kHz]
43210
Time [s]
Extrapolation error
(incorrectly modifying the activation)
Time
Frequency
Separated cluster
Almost unseen frame

• We propose two types of regularizations.
22
Regularization of the temporal continuity
Regularization of the norm minimization
𝑰 : Index matrix ∙ : Binary complement
𝑖 𝜔,𝑡: Entry of index matrix 𝑰 𝑔 𝑘,𝑡: Entry of matrix 𝑮
𝑓𝜔,𝑘: Entry of matrix 𝑭
Previous
frame
The intensity of these regularizations are proportional to the
number of chasms in each frame.

• The cost function in regularized superresolution-based NMF is
defined using the index matrix as
23
: Regularization term
: Penalty term to force and to
become uncorrelated with each other
: Weighting parameter

• The update rules that minimize the cost function are obtained
as follows:
24

Outline
– Hybrid method
factorization
• 4. Experiments
• 5. Conclusions
25

Evaluation experiment
• We compared four methods.
– Conventional hybrid method using PSNMF (Conventional method)
– Proposed hybrid method using superresolution-based NMF without
regularization (Proposed method 1)
– Proposed hybrid method using superresolution-based NMF with
regularization of the temporal continuity (Proposed method 2)
– Proposed hybrid method using superresolution-based NMF with
regularization of the norm minimization (Proposed method 3)
26
Input stereo signal
L-ch R-ch
STFT
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Center component
L-ch R-ch
center cluster
Index of
based SNMF
Superresolution-
based SNMF
Superresolution-
ISTFT ISTFT
Mixing
Extracted signal

Evaluation experiment
• We used stereo-panning signals ( ) and binaural-
recorded signals ( ) containing four instruments, Ob.,
Fl., Tb., and Pf., generated by MIDI synthesizer.
• The sources are mixed as the same power.
• Target source is always located in the center direction (no.1).
• We used the same type of MIDI sounds of the target
instruments as supervision for training process.
27
Center
１
２３
４
Left Right
Target source
Supervision
sound
Two octave notes that cover all notes of the target signal

Experimental results (panning signal)
• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
28
12
10
8
6
4
2
0
SDR[dB]
24
20
16
12
8
4
0
SIR[dB]
10
8
6
4
2
0
SAR[dB]
SDR ：quality of the separated target sound
SIR ：degree of separation between the target and other sounds
SAR ：absence of artificial distortion
Proposed method 1 ：no regularization
Proposed method 2 ：regularization of temporal continuity
Proposed method 3 ：regularization of norm minimization
SDR SIR SARGood
Bad

Experimental results (binaural signal)
• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
29
6
5
4
3
2
1
0
SAR[dB]
20
16
12
8
4
0
SIR[dB]
10
8
6
4
2
0
SDR[dB]
SDR ：quality of the separated target sound
SIR ：degree of separation between the target and other sounds
SAR ：absence of artificial distortion
SDR SIR SAR
Proposed method 1 ：no regularization
Proposed method 2 ：regularization of temporal continuity
Proposed method 3 ：regularization of norm minimization
Bad
Good

Conclusions
• We propose a new supervised NMF algorithm, which is
superresolution-based method, for the hybrid method to
separate stereo or binaural signals.
• The proposed hybrid method can separate the target signal
with high performance compared with conventional method.
• The regularization of norm minimization is effective for the
proposed supervised NMF algorithm.
30
Thank you for your attention!

Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Similar to Regularized superresolution-based binaural signal separation with nonnegative matrix factorization (20)

More from Daichi Kitamura

More from Daichi Kitamura (10)

Recently uploaded

Recently uploaded (20)

Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Editor's Notes