Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

Blind source separation based on
independent low-rank matrix analysis and
its extension to Student's t-distribution
Télécom ParisTech Visiting
September 4th
The University of Tokyo, Japan
Project Research Associate
Daichi Kitamura

• Name: Daichi Kitamura
• Age: 27 (born in 1990)
– Kagawa Pref. in Japan
• Background:
– NAIST, Japan
• Master degree (received in 2014)
– SOKENDAI, Japan
• Ph.D. degree (received in 2017)
– The University of Tokyo, Japan
• Project Research Associate
• Research topics
– Acoustic signal processing, statistical signal processing,
audio source separation, etc.
Self introduction
2
Japan
Kagawa
Tokyo

Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
3

Contents
• Background
– Motivation
• Related Methods
• Conclusion
4

• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
5
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources

• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
Nonnegative matrix factorization (NMF)

Motivation of ILRMA
• Conventional BSS techniques based on ICA
–  Minimum distortion (linear demixing)
–  Relatively fast and stable optimization
• FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary
function technique [N. Ono+, 2010], [N. Ono, 2011]
–  Could not use “specific” assumption of sources
• Only assumes non-Gaussian p.d.f. for sources
–  Permutation problem is crucial and still difficult to solve
• IVA often fails causing a “block permutation problem” [Y. Liang+, 2012]
• Better to use a “specific source model” in TF domain
– Independent low-rank matrix analysis (ILRMA) employs
a low-rank property 7
: frequency bins
Observed
signal
Source signalsFrequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix

Contents
• Background
– Motivation
• Related Methods
• Conclusion
8

• Independent component analysis (ICA)[P. Comon, 1994]
– estimates without knowing
– Source model (scalar)
• is non-Gaussian and mutually independent
– Spatial model
• Mixing system is a time-invariant matrix
• Mixing system in audio signals
– Convolutive mixture with room reverberation
Related methods: ICA
9
Mixing
matrix
Demixing
matrix
Source model
Sources Observed Estimated
Spatial model

• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
– estimates frequency-wise demixing matrix
– Source model (scalar)
• is complex-valued,
non-Gaussian, and
mutually independent
– Spatial model
• Frequency-wise mixing
matrix is time-invariant
– Instantaneous mixture in each frequency band
– A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010]
• Permutation problem?
– Order of estimated signals cannot be determined by ICA
– Alignment of frequency-wise estimated signals is required
• Many permutation solvers were proposed
Related methods: FDICA
10
Spectrograms
ICA1
…
Frequencybin
Time frame
…
ICA2
ICA I

• FDICA requires signal alignment for all frequency
– Order of estimated signals cannot be determined by ICA*
Permutation problem
11
ICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Permutation
Solver
Estimated signal 1
Estimated signal 2
Time
*Signal scale should also be restored by a back-projection technique

Related methods: IVA
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
– extends ICA to multivariate probabilistic model to consider
sourcewise frequency vector as a variable
– Source model (vector)
• is multivariate, spherical, complex-valued, non-Gaussian, and
mutually independent
– Spatial model
• Mixing system is a time-invariant matrix (rank-1 spatial model) 12
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate non-
Gaussian dist.
Have higher-order
correlations
Permutation-free estimation of is achieved!
Source vector

• Spherical multivariate distribution[T. Kim, 2007]
• Why spherical distribution?
– Frequency bands that have similar activations will be merged
together as one source avoid permutation problem
Higher-order correlation assumed in IVA
13
x1 and x2 are mutually independent
Spherical
Laplace dist.
Mutually
independent two
Laplace dist.s
x1 and x2 have higher-order correlation
Probability depends on
only the norm

• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
Comparison of source models
14
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent

Related method: NMF
• Nonnegative matrix factorization (NMF) [D. D. Lee, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequently appearing spectral patterns and their activations
15
Amplitude Amplitude
Nonnegative matrix
(power spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of freq. bins
: # of time frames
: # of bases
Time
Frequency
Frequency

• ISNMF[C. Févotte, 2009]
– can be decomposed using “stable property” of
• If we define ,
Related method: ISNMF
16
Equivalent Circularly symmetric complex Gaussian dist.
Complex-valued observed signal
Nonnegative variance
Variance is also decomposed!

• Power spectrogram corresponds to variances in TF
plane
Related method: ISNMF
17
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance

Contents
• Background
– Motivation
• Related Methods
• Conclusion
18

Extension of source model in IVA
• Source model in IVA
– has a frequency-uniform scale
• Multivariate Laplace with fixed scale
• Since scale cannot be determined, it is
not equivalent to the flat spectral basis
– Almost an NMF with only one basis
• Extend to ISNMF-based source model
– NMF with arbitrary number of bases
• can represent complicated TF structures
– can learn “co-occurrence” of each
source in TF domain
• Co-occurrence is captured as the variance
– The structure can easily be estimated
by NMF
19
Frequency
Time
Frequency
Time

• Spherical Laplace distribution in IVA
• Gaussian distribution with TF-varying variance in
ISNMF[C. Févotte+, 2009]
20
Frequency-uniform scale
Extension of source model in IVA
Complex-valued
Gaussian in each TF bin
Low-rank decomposition
with NMF
Spherical Laplace (bivariate)
Frequency vector
(I-dimensional)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)

• Negative log-likelihood in ILRMA
Cost function in ILRMA and partitioning function
21
All the variables can easily be
optimized by an alternative update
Update rules in ICA
Update rules in ISNMF
Estimated signal:
Cost function in ICA
(estimates demixing matrix)
Cost function in ISNMF
(estimates low-rank source model)

Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
– Pseudo code is available at
• http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 22
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element

• ILRMA with partitioning function
– Appropriate number of bases for each source can
automatically be determined
– Useful when various types of sources are mixed
• Ex. drums are very low-rank but vocals are not so low-rank
Cost function in ILRMA and partitioning function
23
andwhere

Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
24
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element

Optimization process in ILRMA
• Demixing matrix and source model are alternatively
updated
– The precise modeling of low-rank TF structures will
improve the estimation accuracy of demixing matrix
25
Estimating
demixing matrix
Mixture
Separated
Source model
Update
NMF
NMF
Estimating
NMF variables

Comparison of source models
26
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source

• Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013]
Multichannel extension of NMF
27
Spatial covariances in
each time-frequency slot
Observed
multichannel signal
Spatial covariances
of each source Basis matrix Activation matrix
Spatial model Source model
Partitioning function
Spectral patterns
Gains
Spatial property of each source Timber patterns of all sources
Multichannel
vector
Simultaneous spatial covariance

Relationship b/w ILRMA and multichannel NMF
• Difference b/w ILRMA and multichannel NMF?
– Source distribution: complex Gaussian distribution (same)
– ILRMA assumes
– Multichannel NMF assumes full-rank spatial covariance
• Assumption: rank-1 spatial model
– Spatial covariance of each source is rank-1 matrix
– Equivalent to simultaneous mixing assumption
28
Sourcewise steering vector
,

Relationship b/w ILRMA and multichannel NMF
• Multichannel NMF with rank-1 spatial model
30
Substitute into the cost function
Transform the variables as

Relationship b/w MNMF, IVA, and ILRMA
• From multichannel NMF side,
– Rank-1 spatial model is introduced, transform the problem
from the estimation of mixing system to that of demixing
matrix
• From IVA side,
– Increase the number of spectral bases in source model
31
Source model
Spatialmodel
FlexibleLimited
FlexibleLimited
IVA
Multichannel
NMF
ILRMA
NMF source
model
Rank-1 spatial
model

Experimental evaluation
• Conditions
32
Source signals
Music signals obtained from SiSEC
Convolve impulse response, two microphones and two sources
Window length 512 ms of Hamming window
Shift length 128 ms (1/4 shift)
Number of bases
30 per each source (ILRMA w/o partitioning function)
60 for all source (ILRMA with partitioning function)
Evaluation score Improvement ot signal-to-distortion ratio (SDR)
2 m
Source 1
5.66cm
50 50
Source 2
2 m
Source 1
5.66cm
60 60
Source 2
Impulse response E2A
(reverberation time: 300 ms)
Impulse response JR2
(reverberation time: 470 ms)

Results: fort_minor-remember_the_name
33
16
12
8
4
0
-4
-8
SDRimprovement[dB]
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
16
12
8
4
0
-4
-8
SDRimprovement[dB]
Violin synth. Vocals
Violin synth. Vocals
E2A
（300 ms）
JR2
（470 ms）
Poor
Good
Poor
Good

Results: ultimate_nz_tour
34
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
20
15
10
5
0
-5
SDRimprovement[dB]
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
20
15
10
5
0
-5
SDRimprovement[dB]
Guitar Synth.
Guitar Synth.
Poor
Good
Poor
Good
E2A
（300 ms）
JR2
（470 ms）

• Signal length: 14 s
12
10
8
6
4
2
0
-2
SDRimprovement[dB]
4003002001000
Iteration steps
IVA
MNMF
ILRMA
ILRMA
Results: bearlin-roads
35
without Z
with Z
11.5 s
15.1 s 60.7 s
7647.3 s
Poor
Good

Demonstration: music source separation
• Music source separation
36
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
Source
separation
Pay attention to listen
three parts in the mixture
Another demo is available at http://d-kitamura.net/en/index_en.html

• Source model based on Symmetric a-stable (SaS)
distribution[A. Liutkus+, 2015], [U. Şimşekli+, 2015], [S. Leglaive+, 2017], [M. Fontaine+, 2017]
– which can validate the decomposition of complex-valued
r.v.s as the decomposition of their parameters
– Heavy tail (sparse) when a approaches to 0
• Student’s t-distribution is also used as a source
model[C. Févotte+, 2006], [K. Yoshii+, 2016], [K. Kitamura+, 2016], [S. Leglaive+, 2017]
– that includes Cauchy distribution ( ) and Gaussian
distribution ( )
Stable and Student’s t-distributions
37
SaS (stable family)
Student’s t (partially stable)
Cauchy Gauss

Source model of Student’s t-distribution
• Degree-of-freedom parameter
– Heavy tail when
approaches to 0
• Complex Student’s t-dist.
– Circularly symmetric
– Student’s t NMF (t-NMF) [K. Yoshii+ 2016]
38
Defined in each TF slot
Scale corresponds to NMF model
Phase is assumed to be uniform

Motivation for using Student’s t-dist.
• Better separation with t-NMF was reported[K. Yoshii+, 2016]
– in a very simple experiment using
only C4, E4, and G4 piano tones
• NMF with heavy tail distribution
– tends to provide excessive low-rank
approximation
• Sparse components (which may increase
the rank of model data) are considered as
outliers
• ILRMA based on Student’s t source model (t-ILRMA)
– may improve the separation accuracy by forcing NMF
source model to be excessively low-rank
– will be presented at MLSP2017! (preprint is available on arXiv)
• https://arxiv.org/abs/1708.04795 39

• th power spectrogram corresponds to scales in TF
plane
Source model based on Student’s t-distribution
40
Frequencybin
Time frame
: th power spectrogram
Small value of power
Large value of power
Complex Student’s t-distribution with TF-varying scale
Grayscale shows the
value of scale

• Negative log-likelihood in ILRMA
Cost function in ILRMA based on Student’s t-dist.
41
Gaussian ILRMA
modeling power
spectrogram by variance
Student’s t ILRMA
modeling pth power
spectrogram by scale
Generalization
of p.d.f. and
model domain

Experimental results: randomized t-ILRMA
• Examples
– Improved when
– Stable when
but score
is not sufficient
– Root spectrogram
( ) is
preferable for
speech signals
• In the case of
– Source model is
over-fitted to mixture
42
Music signals
Speech signals

Tempering parameter
• Random initialization (previous result)
• Initialization based on Gaussian ILRMA
– (Tempering approach of parameter)
43
t-ILRMA
(iteration: 200)
Identity matrix
Uniform
random values
Gauss ILRMA
(iteration: 100)
Identity matrix
Uniform
random values
t-ILRMA
(iteration: 100)
t-NMF
(iteration: 100)Uniform
random values
arbitrary val.

Experimental results: initialized t-ILRMA
• Examples
– Improved for all
value of
– Could avoid over-
fitting problem in
the case
• Best parameter?
– Completely
depending on
data
44
Music signals
Speech signals

Average results: music signals
45

Average results: speech signals
46

Contents
• Background
– Motivation
• Related Methods
• Conclusion
47

Conclusion
• Independent low-rank matrix analysis (ILRMA)
– Assumption
• Statistical independence between sources
• Low-rank time-frequency structure of each source
– Equivalent to multichannel NMF
• when the mixing assumption is valid
• Student’s t-distribution is newly introduced
– including two symmetric a-stable distributions
• Complex Cauchy distribution ( )
• Complex Gaussian distribution ( )
• Further extensions
– Relaxation of rank-1 spatial model?
– Employ another distribution?
– Supervised ILRMA? User-guided ILRMA? 48

Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

Similar to Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution (20)

More from Daichi Kitamura

More from Daichi Kitamura (8)

Recently uploaded

Recently uploaded (20)

Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

Editor's Notes