Experimental analysis of optimal window length for independent low-rank matrix analysis

Experimental analysis of optimal window length
for independent low-rank matrix analysis
Daichi Kitamura
Nobutaka Ono
Hiroshi Saruwatari
25th European Signal Processing Conference (EUSIPCO) 2017
SS14: Multivariate Analysis for Audio Signal Source Enhancement
August 30, 14:30-16:10
The University of Tokyo, Japan
National Institute of Informatics, Japan
The University of Tokyo, Japan

Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation: fundamental limitation in frequency-domain BSS
• Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Experimental analysis
– Optimal window length
• Music signals and speech signals
• Ideal case and more practical case
• Conclusion
2

Contents
• Background
• Methods
• Conclusion
3

• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
4
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources

• Basic theories and their evolution
History of BSS for audio signals
5
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Nonnegative matrix factorization (NMF)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA

• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Nonnegative matrix factorization (NMF)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA

Motivation: fundamental limitation of BSS
• Mixing assumption in frequency-domain BSS
– “Linear time-invariant mixture” or “rank-1 spatial model”
– Valid only when
• Too long window also causes another problem
– Number of time frames (samples) decreases
• Trade-off between short and long window [S. Araki+, 2003]
– FDICA suffers from the trade-off
– What about for BSS methods
with structural source model?
• IVA and ILRMA 7
: frequency binsObserved
multichannel signal
Source signalsFrequency-wise mixing matrix
: time frames
Statistical bias will increase and estimation becomes unstable
window length used in STFT length of room reverberation
Performance
Window length
Optimal length

Contents
• Background
• Methods
• Conclusion
8

• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA) [A. Hiroe, 2006], [T. Kim, 2006]
BSS methods: FDICA and IVA
9
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent

• Spherical Laplace distribution in IVA
• Zero-mean complex Gaussian distribution with TF-
varying variance (Itakura-Saito NMF)[C. Févotte+, 2009]
10
Frequency-uniform scale
Extension of source distribution in IVA
Zero-mean complex
Gaussian in each TF bin Low-rank decomposition
with NMF
Spherical Laplace (bivariate)
Frequency vector
(I-dimensional)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)
Extended to a more flexible model

• Power spectrogram corresponds to variances in TF
plane
Generative source model in ISNMF
11
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance

BSS methods: ILRMA
• Independent low-rank matrix analysis (ILRMA) [D. Kitamura+,2016]
– Unification of IVA and ISNMF
– Source model in ILRMA
12
Frequency
Basis
Basis
Time
Number of bases can be set to arbitrary value
Frequency
Time
Observed Estimated
Low-rank decomposition
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
have low-rank structure in time-frequency domain
STFT
Demixing
matrix

Comparison of source models
13
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source

Contents
• Background
• Methods
• Conclusion
14

Experimental analysis
• Window length in STFT
– If window length is too short
• Mixing assumption does not hold anymore
– If window length is too long
• Estimation becomes unstable (# of time frames decreases)
15
Frequency
Time
…
DFT
DFT
DFT
Spectrogram
…
Window length (= DFT length)
Shift length
Window function
Waveform
• Our expectation
– Full time-frequency modeling of sources in ILRMA may improve the
robustness to a decrease in the number of time frames

• Dataset: 4 music and 4 speech from SiSEC [S. Araki+, 2012]
• Mixing: convolution with RIR in RWCP [S. Nakamura+, 2000]
16
Signal Data name Source (1/2) Length [s]
Music bearlin-roads acoustic_guit_main/vocals 14.6
Music another_dreamer-the_ones_we_love guitar/vocals 25.6
Music fort_minor-remember_the_name violins_synth/vocals 24.6
Music ultimate_nz_tour guitar/synth 18.6
Speech dev1_female4 src_1/src_2 10.0
Speech dev1_female4 src_3/src_4 10.0
Speech dev1_male4 src_1/src_2 10.0
Speech dev1_male4 src_3/src_4 10.0
2 m
Source 1
5.66cm
50 50
2 m
5.66cm
60 60
Impulse response E2A
(reverberation time: T60 = 300 ms)
Impulse response JR2
(reverberation time: T60 = 470 ms)
Source 2 Source 1 Source 2

• Compared methods
– FDICA+IPS (ideal permutation solver)
• Align permutation of estimated components using the reference
(oracle) source spectrogram (upper limit performance of FDICA)
– FDICA+DOA (DOA-based permutation solver) [S. Kurita+, 2000]
• Align permutation of estimated components using DOA after FDICA
– IVA [N. Ono, 2011]
• using auxiliary function method (a.k.a. MM algorithm) in optimization
– ILRMA [D. Kitamura+, 2016]
• with several numbers of bases
• Other conditions
– Window function: Hamming window
– Window length: 32 ~ 2048 ms
– Shift length: Always quarter of window length
17

Comparison using ideal initialization: condition
• Set initial value of demixing matrix to oracle:
– This initial value provides the best separation performance
under the assumption
• Set initial value of source model as oracle
(only for ILRMA):
18
Power spectrogram of th source
FDICA+DOA & IVA: spatial oracle initialization
FDICA+IPS & ILRMA: spatial and spectral oracle initialization

Comparison using ideal initialization: results
19
Music
T60 =0.30 s
Music
T60 =0.47 s
Speech
T60 =0.30 s
Speech
T60 =0.47 s

Comparison using random initialization: condition
• Set initial value of demixing matrix to identity
matrix
• Set initial value of source model to uniform
random value between [0,1] (only for ILRMA)
20
FDICA+DOA, IVA, & ILRMA: fully blind method
FDICA+IPS: using oracle spectrogram

Comparison using random initialization: results
21
Music
T60 =0.30 s
Music
T60 =0.47 s
Speech
T60 =0.30 s
Speech
T60 =0.47 s

Conclusion
• In the case of ILRMA with oracle initialization, the
robustness to long windows (fewer time frames) can
be improved
– optimal window length is longer than that in FDICA or IVA
– thanks to employing not only the independence between
sources but also a full modeling of time-frequency structure
for the estimation of the demixing matrix
• In a practical situation (fully blind case),
– optimal window length is similar to that in FDICA or IVA
– difficulty of the blind estimation of a precise spectral model
in ILRMA
22
Thank you for your attention!

Experimental analysis of optimal window length for independent low-rank matrix analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Experimental analysis of optimal window length for independent low-rank matrix analysis

Similar to Experimental analysis of optimal window length for independent low-rank matrix analysis (20)

More from Daichi Kitamura

More from Daichi Kitamura (10)

Recently uploaded

Recently uploaded (20)

Experimental analysis of optimal window length for independent low-rank matrix analysis

Editor's Notes