Experimental analysis of optimal window length for independent low-rank matrix analysis

Daichi Kitamura
Daichi KitamuraAssistant Professor at National Institute of Technology, Kagawa College
Experimental analysis of optimal window length
for independent low-rank matrix analysis
Daichi Kitamura
Nobutaka Ono
Hiroshi Saruwatari
25th European Signal Processing Conference (EUSIPCO) 2017
SS14: Multivariate Analysis for Audio Signal Source Enhancement
August 30, 14:30-16:10
The University of Tokyo, Japan
National Institute of Informatics, Japan
The University of Tokyo, Japan
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation: fundamental limitation in frequency-domain BSS
• Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Experimental analysis
– Optimal window length
• Music signals and speech signals
• Ideal case and more practical case
• Conclusion
2
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation: fundamental limitation in frequency-domain BSS
• Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Experimental analysis
– Optimal window length
• Music signals and speech signals
• Ideal case and more practical case
• Conclusion
3
• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
4
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources
• Basic theories and their evolution
History of BSS for audio signals
5
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Nonnegative matrix factorization (NMF)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Nonnegative matrix factorization (NMF)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
Motivation: fundamental limitation of BSS
• Mixing assumption in frequency-domain BSS
– “Linear time-invariant mixture” or “rank-1 spatial model”
– Valid only when
• Too long window also causes another problem
– Number of time frames (samples) decreases
• Trade-off between short and long window [S. Araki+, 2003]
– FDICA suffers from the trade-off
– What about for BSS methods
with structural source model?
• IVA and ILRMA 7
: frequency binsObserved
multichannel signal
Source signalsFrequency-wise mixing matrix
: time frames
Statistical bias will increase and estimation becomes unstable
window length used in STFT length of room reverberation
Performance
Window length
Optimal length
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation: fundamental limitation in frequency-domain BSS
• Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Experimental analysis
– Optimal window length
• Music signals and speech signals
• Ideal case and more practical case
• Conclusion
8
• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA) [A. Hiroe, 2006], [T. Kim, 2006]
BSS methods: FDICA and IVA
9
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent
• Spherical Laplace distribution in IVA
• Zero-mean complex Gaussian distribution with TF-
varying variance (Itakura-Saito NMF)[C. Févotte+, 2009]
10
Frequency-uniform scale
Extension of source distribution in IVA
Zero-mean complex
Gaussian in each TF bin Low-rank decomposition
with NMF
Spherical Laplace (bivariate)
Frequency vector
(I-dimensional)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)
Extended to a more flexible model
• Power spectrogram corresponds to variances in TF
plane
Generative source model in ISNMF
11
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance
BSS methods: ILRMA
• Independent low-rank matrix analysis (ILRMA) [D. Kitamura+,2016]
– Unification of IVA and ISNMF
– Source model in ILRMA
12
Frequency
Basis
Basis
Time
Number of bases can be set to arbitrary value
Frequency
Time
Observed Estimated
Low-rank decomposition
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
have low-rank structure in time-frequency domain
STFT
Demixing
matrix
Comparison of source models
13
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation: fundamental limitation in frequency-domain BSS
• Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Experimental analysis
– Optimal window length
• Music signals and speech signals
• Ideal case and more practical case
• Conclusion
14
Experimental analysis
• Window length in STFT
– If window length is too short
• Mixing assumption does not hold anymore
– If window length is too long
• Estimation becomes unstable (# of time frames decreases)
15
Frequency
Time
…
DFT
DFT
DFT
Spectrogram
…
Window length (= DFT length)
Shift length
Window function
Waveform
• Our expectation
– Full time-frequency modeling of sources in ILRMA may improve the
robustness to a decrease in the number of time frames
Experimental analysis
• Dataset: 4 music and 4 speech from SiSEC [S. Araki+, 2012]
• Mixing: convolution with RIR in RWCP [S. Nakamura+, 2000]
16
Signal Data name Source (1/2) Length [s]
Music bearlin-roads acoustic_guit_main/vocals 14.6
Music another_dreamer-the_ones_we_love guitar/vocals 25.6
Music fort_minor-remember_the_name violins_synth/vocals 24.6
Music ultimate_nz_tour guitar/synth 18.6
Speech dev1_female4 src_1/src_2 10.0
Speech dev1_female4 src_3/src_4 10.0
Speech dev1_male4 src_1/src_2 10.0
Speech dev1_male4 src_3/src_4 10.0
2 m
Source 1
5.66cm
50 50
2 m
5.66cm
60 60
Impulse response E2A
(reverberation time: T60 = 300 ms)
Impulse response JR2
(reverberation time: T60 = 470 ms)
Source 2 Source 1 Source 2
Experimental analysis
• Compared methods
– FDICA+IPS (ideal permutation solver)
• Align permutation of estimated components using the reference
(oracle) source spectrogram (upper limit performance of FDICA)
– FDICA+DOA (DOA-based permutation solver) [S. Kurita+, 2000]
• Align permutation of estimated components using DOA after FDICA
– IVA [N. Ono, 2011]
• using auxiliary function method (a.k.a. MM algorithm) in optimization
– ILRMA [D. Kitamura+, 2016]
• with several numbers of bases
• Other conditions
– Window function: Hamming window
– Window length: 32 ~ 2048 ms
– Shift length: Always quarter of window length
17
Comparison using ideal initialization: condition
• Set initial value of demixing matrix to oracle:
– This initial value provides the best separation performance
under the assumption
• Set initial value of source model as oracle
(only for ILRMA):
18
Power spectrogram of th source
FDICA+DOA & IVA: spatial oracle initialization
FDICA+IPS & ILRMA: spatial and spectral oracle initialization
Comparison using ideal initialization: results
19
Music
T60 =0.30 s
Music
T60 =0.47 s
Speech
T60 =0.30 s
Speech
T60 =0.47 s
Comparison using random initialization: condition
• Set initial value of demixing matrix to identity
matrix
• Set initial value of source model to uniform
random value between [0,1] (only for ILRMA)
20
FDICA+DOA, IVA, & ILRMA: fully blind method
FDICA+IPS: using oracle spectrogram
Comparison using random initialization: results
21
Music
T60 =0.30 s
Music
T60 =0.47 s
Speech
T60 =0.30 s
Speech
T60 =0.47 s
Conclusion
• In the case of ILRMA with oracle initialization, the
robustness to long windows (fewer time frames) can
be improved
– optimal window length is longer than that in FDICA or IVA
– thanks to employing not only the independence between
sources but also a full modeling of time-frequency structure
for the estimation of the demixing matrix
• In a practical situation (fully blind case),
– optimal window length is similar to that in FDICA or IVA
– difficulty of the blind estimation of a precise spectral model
in ILRMA
22
Thank you for your attention!
1 of 22

Recommended

Audio Source Separation Based on Low-Rank Structure and Statistical Independence by
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
2.5K views54 slides
Blind source separation based on independent low-rank matrix analysis and its... by
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
1.6K views47 slides
Blind source separation based on independent low-rank matrix analysis and its... by
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
1.4K views50 slides
DNN-based permutation solver for frequency-domain independent component analy... by
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
61 views18 slides
Blind audio source separation based on time-frequency structure models by
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
311 views46 slides
Prior distribution design for music bleeding-sound reduction based on nonnega... by
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
99 views29 slides

More Related Content

What's hot

Relaxation of rank-1 spatial constraint in overdetermined blind source separa... by
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
1.6K views23 slides
Efficient initialization for nonnegative matrix factorization based on nonneg... by
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
2.8K views19 slides
Koyama ASA ASJ joint meeting 2016 by
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
14.4K views23 slides
Robust music signal separation based on supervised nonnegative matrix factori... by
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
1.2K views30 slides
Regularized superresolution-based binaural signal separation with nonnegative... by
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
701 views30 slides
Ica2016 312 saruwatari by
Ica2016 312 saruwatariIca2016 312 saruwatari
Ica2016 312 saruwatariSaruwatariLabUTokyo
14.2K views18 slides

What's hot(20)

Relaxation of rank-1 spatial constraint in overdetermined blind source separa... by Daichi Kitamura
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Daichi Kitamura1.6K views
Efficient initialization for nonnegative matrix factorization based on nonneg... by Daichi Kitamura
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
Daichi Kitamura2.8K views
Robust music signal separation based on supervised nonnegative matrix factori... by Daichi Kitamura
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
Daichi Kitamura1.2K views
Regularized superresolution-based binaural signal separation with nonnegative... by Daichi Kitamura
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
Daichi Kitamura701 views
Divergence optimization in nonnegative matrix factorization with spectrogram ... by Daichi Kitamura
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Daichi Kitamura942 views
Depth estimation of sound images using directional clustering and activation-... by Daichi Kitamura
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
Daichi Kitamura919 views
Online divergence switching for superresolution-based nonnegative matrix fact... by Daichi Kitamura
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
Daichi Kitamura624 views
Hybrid multichannel signal separation using supervised nonnegative matrix fac... by Daichi Kitamura
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Daichi Kitamura1.1K views
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用 by Kitamura Laboratory
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ... by Takuma_OKAMOTO
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Takuma_OKAMOTO884 views
Deep Learning Based Voice Activity Detection and Speech Enhancement by NAVER Engineering
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering1.9K views

Viewers also liked

ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi... by
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...Daichi Kitamura
1.8K views24 slides
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法 by
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法Daichi Kitamura
3.5K views23 slides
擬似ハムバッキングピックアップの弦振動応答 (in Japanese) by
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)Daichi Kitamura
1.1K views13 slides
Music signal separation using supervised nonnegative matrix factorization wit... by
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Daichi Kitamura
985 views30 slides
音源分離における音響モデリング(Acoustic modeling in audio source separation) by
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
22.5K views114 slides
Study on optimal divergence for superresolution-based supervised nonnegative ... by
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Daichi Kitamura
1K views47 slides

Viewers also liked(14)

ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi... by Daichi Kitamura
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
Daichi Kitamura1.8K views
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法 by Daichi Kitamura
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
Daichi Kitamura3.5K views
擬似ハムバッキングピックアップの弦振動応答 (in Japanese) by Daichi Kitamura
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
Daichi Kitamura1.1K views
Music signal separation using supervised nonnegative matrix factorization wit... by Daichi Kitamura
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
Daichi Kitamura985 views
音源分離における音響モデリング(Acoustic modeling in audio source separation) by Daichi Kitamura
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
Daichi Kitamura22.5K views
Study on optimal divergence for superresolution-based supervised nonnegative ... by Daichi Kitamura
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
Daichi Kitamura1K views
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法 by Daichi Kitamura
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
Daichi Kitamura4.3K views
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia... by Daichi Kitamura
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
Daichi Kitamura4.9K views
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese) by Daichi Kitamura
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
Daichi Kitamura5.9K views
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep... by Daichi Kitamura
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
Daichi Kitamura5.9K views
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto... by Daichi Kitamura
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
Daichi Kitamura5.9K views
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou... by Daichi Kitamura
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
Daichi Kitamura12.2K views
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on... by Daichi Kitamura
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
Daichi Kitamura2.9K views
ICASSP2017読み会(関東編)・AASP_L3(北村担当分) by Daichi Kitamura
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
Daichi Kitamura4K views

Similar to Experimental analysis of optimal window length for independent low-rank matrix analysis

DNN-based frequency-domain permutation solver for multichannel audio source s... by
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
29 views27 slides
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and... by
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
43 views17 slides
Analysis for Radar and Electronic Warfare by
Analysis for Radar and Electronic WarfareAnalysis for Radar and Electronic Warfare
Analysis for Radar and Electronic WarfareReza Taryghat
7.9K views133 slides
frogcelsat by
frogcelsatfrogcelsat
frogcelsatVasvi Gupta
199 views44 slides
Digital Signal Processing-Digital Filters by
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersNelson Anand
2.7K views66 slides
Sampling by
SamplingSampling
SamplingMuhammad Uzair Rasheed
13.2K views34 slides

Similar to Experimental analysis of optimal window length for independent low-rank matrix analysis(20)

DNN-based frequency-domain permutation solver for multichannel audio source s... by Kitamura Laboratory
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and... by Kitamura Laboratory
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Analysis for Radar and Electronic Warfare by Reza Taryghat
Analysis for Radar and Electronic WarfareAnalysis for Radar and Electronic Warfare
Analysis for Radar and Electronic Warfare
Reza Taryghat7.9K views
Digital Signal Processing-Digital Filters by Nelson Anand
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
Nelson Anand2.7K views
Speaker Dependent WaveNet Vocoder by Akira Tamamori
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
Akira Tamamori2.6K views
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ... by ijsrd.com
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
ijsrd.com256 views
COLEA : A MATLAB Tool for Speech Analysis by Rushin Shah
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
Rushin Shah2.1K views
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea... by IRJET Journal
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
IRJET Journal10 views
Digital Signal Processor evolution over the last 30 years by Francois Charlot
Digital Signal Processor evolution over the last 30 yearsDigital Signal Processor evolution over the last 30 years
Digital Signal Processor evolution over the last 30 years
Francois Charlot14K views
Lecture_1 (1).pptx by DavidHamxa
Lecture_1 (1).pptxLecture_1 (1).pptx
Lecture_1 (1).pptx
DavidHamxa2 views
Heart rate estimation of car driver using radar sensors and blind source sepa... by Kitamura Laboratory
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
Final presentation by Rohan Lad
Final presentationFinal presentation
Final presentation
Rohan Lad552 views
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ... by IJMER
Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ...
IJMER437 views

More from Daichi Kitamura

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank... by
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
1.5K views91 slides
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価 by
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価Daichi Kitamura
1.1K views24 slides
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも) by
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Daichi Kitamura
2.8K views67 slides
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank... by
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...Daichi Kitamura
8.2K views67 slides
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s... by
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...Daichi Kitamura
4.1K views26 slides
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen... by
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
2.1K views15 slides

More from Daichi Kitamura(10)

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank... by Daichi Kitamura
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
Daichi Kitamura1.5K views
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価 by Daichi Kitamura
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
Daichi Kitamura1.1K views
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも) by Daichi Kitamura
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Daichi Kitamura2.8K views
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank... by Daichi Kitamura
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
Daichi Kitamura8.2K views
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s... by Daichi Kitamura
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
Daichi Kitamura4.1K views
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen... by Daichi Kitamura
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
Daichi Kitamura2.1K views
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm) by Daichi Kitamura
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
Daichi Kitamura2K views
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese) by Daichi Kitamura
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
Daichi Kitamura1.3K views
Evaluation of separation accuracy for various real instruments based on super... by Daichi Kitamura
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...
Daichi Kitamura675 views
Divergence optimization based on trade-off between separation and extrapolati... by Daichi Kitamura
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...
Daichi Kitamura917 views

Recently uploaded

Stone Masonry and Brick Masonry.pdf by
Stone Masonry and Brick Masonry.pdfStone Masonry and Brick Masonry.pdf
Stone Masonry and Brick Masonry.pdfMohammed Abdullah Laskar
19 views6 slides
13_DVD_Latch-up_prevention.pdf by
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdfUsha Mehta
9 views16 slides
Machine Element II Course outline.pdf by
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdfodatadese1
6 views2 slides
Electronic Devices - Integrated Circuit.pdf by
Electronic Devices - Integrated Circuit.pdfElectronic Devices - Integrated Circuit.pdf
Electronic Devices - Integrated Circuit.pdfbooksarpita
11 views46 slides
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Tsuyoshi Horigome
18 views16 slides
Dynamics of Hard-Magnetic Soft Materials by
Dynamics of Hard-Magnetic Soft MaterialsDynamics of Hard-Magnetic Soft Materials
Dynamics of Hard-Magnetic Soft MaterialsShivendra Nandan
13 views32 slides

Recently uploaded(20)

13_DVD_Latch-up_prevention.pdf by Usha Mehta
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdf
Usha Mehta9 views
Machine Element II Course outline.pdf by odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese16 views
Electronic Devices - Integrated Circuit.pdf by booksarpita
Electronic Devices - Integrated Circuit.pdfElectronic Devices - Integrated Circuit.pdf
Electronic Devices - Integrated Circuit.pdf
booksarpita11 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Dynamics of Hard-Magnetic Soft Materials by Shivendra Nandan
Dynamics of Hard-Magnetic Soft MaterialsDynamics of Hard-Magnetic Soft Materials
Dynamics of Hard-Magnetic Soft Materials
Shivendra Nandan13 views
2_DVD_ASIC_Design_FLow.pdf by Usha Mehta
2_DVD_ASIC_Design_FLow.pdf2_DVD_ASIC_Design_FLow.pdf
2_DVD_ASIC_Design_FLow.pdf
Usha Mehta14 views
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,... by AakashShakya12
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
AakashShakya1245 views
An approach of ontology and knowledge base for railway maintenance by IJECEIAES
An approach of ontology and knowledge base for railway maintenanceAn approach of ontology and knowledge base for railway maintenance
An approach of ontology and knowledge base for railway maintenance
IJECEIAES12 views
9_DVD_Dynamic_logic_circuits.pdf by Usha Mehta
9_DVD_Dynamic_logic_circuits.pdf9_DVD_Dynamic_logic_circuits.pdf
9_DVD_Dynamic_logic_circuits.pdf
Usha Mehta21 views
How I learned to stop worrying and love the dark silicon apocalypse.pdf by Tomasz Kowalczewski
How I learned to stop worrying and love the dark silicon apocalypse.pdfHow I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdf
NEW SUPPLIERS SUPPLIES (copie).pdf by georgesradjou
NEW SUPPLIERS SUPPLIES (copie).pdfNEW SUPPLIERS SUPPLIES (copie).pdf
NEW SUPPLIERS SUPPLIES (copie).pdf
georgesradjou7 views
CHI-SQUARE ( χ2) TESTS.pptx by ssusera597c5
CHI-SQUARE ( χ2) TESTS.pptxCHI-SQUARE ( χ2) TESTS.pptx
CHI-SQUARE ( χ2) TESTS.pptx
ssusera597c520 views
Thermal aware task assignment for multicore processors using genetic algorithm by IJECEIAES
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm
IJECEIAES29 views

Experimental analysis of optimal window length for independent low-rank matrix analysis

  • 1. Experimental analysis of optimal window length for independent low-rank matrix analysis Daichi Kitamura Nobutaka Ono Hiroshi Saruwatari 25th European Signal Processing Conference (EUSIPCO) 2017 SS14: Multivariate Analysis for Audio Signal Source Enhancement August 30, 14:30-16:10 The University of Tokyo, Japan National Institute of Informatics, Japan The University of Tokyo, Japan
  • 2. Contents • Background – Blind source separation (BSS) for audio signals – Motivation: fundamental limitation in frequency-domain BSS • Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Experimental analysis – Optimal window length • Music signals and speech signals • Ideal case and more practical case • Conclusion 2
  • 3. Contents • Background – Blind source separation (BSS) for audio signals – Motivation: fundamental limitation in frequency-domain BSS • Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Experimental analysis – Optimal window length • Music signals and speech signals • Ideal case and more practical case • Conclusion 3
  • 4. • Blind source separation (BSS) for audio signals – separates original audio sources – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. – can be available for many audio app. • Consider only “determined” situation Background 4 Recording mixture Separated guitar BSS Sources Observed Estimated Mixing system Demixing system # of mics # of sources
  • 5. • Basic theories and their evolution History of BSS for audio signals 5 1994 1998 2013 1999 2012 Age Many permutation solvers for FDICA Apply NMF to many tasks Generative models in NMF Many extensions of NMF Independent component analysis (ICA) Nonnegative matrix factorization (NMF) Frequency-domain ICA (FDICA) Itakura–Saito NMF (ISNMF) Independent vector analysis (IVA) Multichannel NMF Independent low-rank matrix analysis (ILRMA) *Depicting only popular methods 2016 2009 2006 2011 Auxiliary-function-based IVA (AuxIVA) Time-varying Gaussian IVA
  • 6. • Basic theories and their evolution History of BSS for audio signals 6 1994 1998 2013 1999 2012 Age Many permutation solvers for FDICA Apply NMF to many tasks Generative models in NMF Many extensions of NMF Independent component analysis (ICA) Nonnegative matrix factorization (NMF) Frequency-domain ICA (FDICA) Itakura–Saito NMF (ISNMF) Independent vector analysis (IVA) Multichannel NMF Independent low-rank matrix analysis (ILRMA) *Depicting only popular methods 2016 2009 2006 2011 Auxiliary-function-based IVA (AuxIVA) Time-varying Gaussian IVA
  • 7. Motivation: fundamental limitation of BSS • Mixing assumption in frequency-domain BSS – “Linear time-invariant mixture” or “rank-1 spatial model” – Valid only when • Too long window also causes another problem – Number of time frames (samples) decreases • Trade-off between short and long window [S. Araki+, 2003] – FDICA suffers from the trade-off – What about for BSS methods with structural source model? • IVA and ILRMA 7 : frequency binsObserved multichannel signal Source signalsFrequency-wise mixing matrix : time frames Statistical bias will increase and estimation becomes unstable window length used in STFT length of room reverberation Performance Window length Optimal length
  • 8. Contents • Background – Blind source separation (BSS) for audio signals – Motivation: fundamental limitation in frequency-domain BSS • Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Experimental analysis – Optimal window length • Music signals and speech signals • Ideal case and more practical case • Conclusion 8
  • 9. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] • Independent vector analysis (IVA) [A. Hiroe, 2006], [T. Kim, 2006] BSS methods: FDICA and IVA 9 Observed Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Estimated Demixing matrix Current empirical dist. Non-Gaussian source dist. STFT Frequency Time Frequency Time Observed Estimated Current empirical dist. STFT Frequency Time Frequency Time Non-Gaussian spherical source dist. Scalar r.v.s Vector (multivariate) r.v.s Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Mixture is close to Gaussian signal because of CLT Source obeys non- Gaussian dist. Mutually independent Demixing matrix Mutually independent
  • 10. • Spherical Laplace distribution in IVA • Zero-mean complex Gaussian distribution with TF- varying variance (Itakura-Saito NMF)[C. Févotte+, 2009] 10 Frequency-uniform scale Extension of source distribution in IVA Zero-mean complex Gaussian in each TF bin Low-rank decomposition with NMF Spherical Laplace (bivariate) Frequency vector (I-dimensional) Time-frequency-varying variance Time-frequency matrix (IJ-dimensional) Extended to a more flexible model
  • 11. • Power spectrogram corresponds to variances in TF plane Generative source model in ISNMF 11 Frequencybin Time frame : Power spectrogram Small value of power Large value of power Complex Gaussian distribution with TF-varying variance If we marginalize in terms of time or frequency, the distribution becomes non-Gaussian even though each TF grid is defined in Gaussian distribution Grayscale shows the value of variance
  • 12. BSS methods: ILRMA • Independent low-rank matrix analysis (ILRMA) [D. Kitamura+,2016] – Unification of IVA and ISNMF – Source model in ILRMA 12 Frequency Basis Basis Time Number of bases can be set to arbitrary value Frequency Time Observed Estimated Low-rank decomposition Time Frequency Frequency Time Update demixing matrix so that estimated signals have low-rank structure in time-frequency domain STFT Demixing matrix
  • 13. Comparison of source models 13 FDICA source model Non-Gaussian scalar variable IVA source model Non-Gaussian vector variable with higher-order correlation ILRMA source model Non-Gaussian matrix variable with low-rank time-frequency structure Rank of TF matrix of mixture Rank of TF matrix of each source
  • 14. Contents • Background – Blind source separation (BSS) for audio signals – Motivation: fundamental limitation in frequency-domain BSS • Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Experimental analysis – Optimal window length • Music signals and speech signals • Ideal case and more practical case • Conclusion 14
  • 15. Experimental analysis • Window length in STFT – If window length is too short • Mixing assumption does not hold anymore – If window length is too long • Estimation becomes unstable (# of time frames decreases) 15 Frequency Time … DFT DFT DFT Spectrogram … Window length (= DFT length) Shift length Window function Waveform • Our expectation – Full time-frequency modeling of sources in ILRMA may improve the robustness to a decrease in the number of time frames
  • 16. Experimental analysis • Dataset: 4 music and 4 speech from SiSEC [S. Araki+, 2012] • Mixing: convolution with RIR in RWCP [S. Nakamura+, 2000] 16 Signal Data name Source (1/2) Length [s] Music bearlin-roads acoustic_guit_main/vocals 14.6 Music another_dreamer-the_ones_we_love guitar/vocals 25.6 Music fort_minor-remember_the_name violins_synth/vocals 24.6 Music ultimate_nz_tour guitar/synth 18.6 Speech dev1_female4 src_1/src_2 10.0 Speech dev1_female4 src_3/src_4 10.0 Speech dev1_male4 src_1/src_2 10.0 Speech dev1_male4 src_3/src_4 10.0 2 m Source 1 5.66cm 50 50 2 m 5.66cm 60 60 Impulse response E2A (reverberation time: T60 = 300 ms) Impulse response JR2 (reverberation time: T60 = 470 ms) Source 2 Source 1 Source 2
  • 17. Experimental analysis • Compared methods – FDICA+IPS (ideal permutation solver) • Align permutation of estimated components using the reference (oracle) source spectrogram (upper limit performance of FDICA) – FDICA+DOA (DOA-based permutation solver) [S. Kurita+, 2000] • Align permutation of estimated components using DOA after FDICA – IVA [N. Ono, 2011] • using auxiliary function method (a.k.a. MM algorithm) in optimization – ILRMA [D. Kitamura+, 2016] • with several numbers of bases • Other conditions – Window function: Hamming window – Window length: 32 ~ 2048 ms – Shift length: Always quarter of window length 17
  • 18. Comparison using ideal initialization: condition • Set initial value of demixing matrix to oracle: – This initial value provides the best separation performance under the assumption • Set initial value of source model as oracle (only for ILRMA): 18 Power spectrogram of th source FDICA+DOA & IVA: spatial oracle initialization FDICA+IPS & ILRMA: spatial and spectral oracle initialization
  • 19. Comparison using ideal initialization: results 19 Music T60 =0.30 s Music T60 =0.47 s Speech T60 =0.30 s Speech T60 =0.47 s
  • 20. Comparison using random initialization: condition • Set initial value of demixing matrix to identity matrix • Set initial value of source model to uniform random value between [0,1] (only for ILRMA) 20 FDICA+DOA, IVA, & ILRMA: fully blind method FDICA+IPS: using oracle spectrogram
  • 21. Comparison using random initialization: results 21 Music T60 =0.30 s Music T60 =0.47 s Speech T60 =0.30 s Speech T60 =0.47 s
  • 22. Conclusion • In the case of ILRMA with oracle initialization, the robustness to long windows (fewer time frames) can be improved – optimal window length is longer than that in FDICA or IVA – thanks to employing not only the independence between sources but also a full modeling of time-frequency structure for the estimation of the demixing matrix • In a practical situation (fully blind case), – optimal window length is similar to that in FDICA or IVA – difficulty of the blind estimation of a precise spectral model in ILRMA 22 Thank you for your attention!

Editor's Notes

  1. This talk treats blind source separation problem, BSS, which is a separation technique of individual sources from the recorded mixture. The word “blind” means that the method does not require any prior information about the recording conditions, such as locations of microphones, sources, and room geometry. This kind of technique is very useful for many audio applications as a front-end system. In this talk, we only consider a “determined” situation, namely, the number of microphones is always equal to the number of sources.
  2. This slid shows a history of basic theories in audio BSS. For acoustic signals, independent component analysis, ICA, was applied to the frequency domain signals as FDICA. After that, many permutation solvers for FDICA have been proposed, but eventually, an elegant solution, independent vector analysis, IVA was proposed. It is still extended to more flexible models. On the other hand, nonnegative matrix factorization, NMF, is also developed and extended to a multichannel signals for source separation problems. Recently, we have developed a new framework, which unifies these two powerful theories, called independent low-rank matrix analysis, ILRMA. I will explain about the detail, but in this talk,
  3. we only focus on only these algorithms, FDICA, IVA, and ILRMA.
  4. I here explain the motivation of this talk. In many frequency-domain BSS techniques, this equation, x=As, is always assumed, where x is a multichannel mixture signal in the frequency domain, i is a frequency bin and j is a time frames, A is a frequency-wise mixing matrix, and s is an original source. This is often called “linear time-invariant mixture” or a “rank-1 spatial model,” and this assumption is valid only when the window length in STFT is much longer than the length of room reverberation. So, we must use a longer window in STFT for validating this mixing assumption. However, if we use too much long window in STFT, the statistical bias will increase and the estimation becomes unstable. This is because the number of time frames J decreases. Therefore, there is a trade-off between short and long window lengths like this figure. In the paper in 2003, this trade-off was revealed only for FDICA. But we don’t know about this issue for the BSS methods that employ s structural source model, I mean it’s an IVA or ILRMA. So, in this talk, we experimentally confirm about this point, about the optimal window length for the new BSS techniques including ILRMA.
  5. I briefly explain the separation mechanism in FDICA and IVA. In FDICA, ICA is applied to each frequency bin considering the scalar time-series signal as a random variable, and we maximize its non-Gaussianity to estimate the frequency-wise demixing matrix. In IVA, we consider a vector time-series random variable including all frequencies like this figure, then we assume a multivariate non-Gaussian distribution with a spherical property. Since spherical property ensures higher-order correlation among frequency bins, the permutation problem can be avoided in IVA.
  6. The spherical source distribution in IVA can be extended to a more flexible model. We have extend it to a local Gaussian model, which employs a zero-mean complex Gaussian distribution with time-frequency-varying variance. Namely, in each time-frequency slot, i and j, complex Gaussian distribution is defined, and its variance, r, can fluctuate depending on time and frequency. This generative model is equivalent to that in Itakura-Saito NMF, and the variance r can be decomposed into a basis matrix T and an activation matrix V.
  7. This is a graphical interpretation of the source model in ISNMF. In each time-frequency slot, zero-mean complex Gaussian distribution is defined, and they are mutually independent in all time, frequency, and sources. Now, the variance of these Gaussians is corresponding to the power spectrogram. Therefore, in the slot that has a strong power, such as a spectral peak, the Gaussian becomes wider, and the large power component can easily be generated. Note that, even though each slot is Gaussian, the marginal distribution in terms of time is non-Gaussian, because the variance fluctuates. So, since this matrix generative model is non-Gaussian, we can use this distribution as a source model in ICA-based method
  8. resulting in an independent low-rank matrix analysis (ILRMA). Therefore, ILRMA is a unified method of IVA and ISNMF, and we employed NMF source model to capture the low-rank time-frequency structures of each source. This source model can improve the estimation accuracy of the demixing matrix.
  9. This is a comparison of source models in FDICA, IVA, and ILRMA again. The important idea used in ILRMA is that the rank of TF matrix of mixture signal is always grater than the rank of TF matrix of each source before mixing. So, if we assume not only the independence between source but also a low-rank TF structure for each source, the separation will be done accurately.
  10. As I already explained, the window length in STFT affects the performance of ICA-based separation. If we use too short window, the mixing assumption, x=As, does not hold anymore, and if we use too long window, the estimation becomes unstable because the number of time frames J decreases. However, ILRMA employs a full time-frequency modeling of sources. This model may improve the robustness to a decrease in J. This is our expectation. Let’s check about this issue by the experiment.
  11. Here we used 4 music and 4 speech signals obtained from SiSEC database, and we produced the observed signal by convoluting the impulse response shown in the bottom. We used two types of impulse response, one has 300-ms-long reverberation, and the other one is 470 ms.
  12. We compared 4 methods, FDICA + ideal permutation solver, FDICA + DOA-based permutation solver, IVA, and ILRMA. In FDICA+IPS, we used the reference, oracle source spectrogram. So this is an upper limit of FDICA. FDICA+DOA is a blind method that uses DOA clustering for solving the permutation problem. Of cause IVA and ILRMA are also blind method. Then, we used Hamming window with various window lengths.
  13. First, we show the results with ideal initialization case. Namely, we first give a correct answer of demixing matrix as an initial value, which can be calculated using the oracle source s. So, the initial value provides the best separation performance here. In addition, only for ILRMA, we set the initial value of NMF model T and V as the oracle values. Therefore, FDICA+DOA and IVA are using the spatial oracle initialization, and FDICA+IPS and ILRMA are using spatial and spectral oracle initialization.
  14. This is the result. The left ones are music, and right ones are the speech, and the reverberation time is short (top) and long (bottom). The horizontal axis shows the window length, and the vertical axis shows the separation performance. The colored lines are the results of ILRMA with various numbers of NMF bases. In the music results, we can see that FDICA and IVA could not achieve the good separation when the window becomes long. In ILRMA, the performance maintains even in a long long windows. This is obtained from the full modeling of time-frequency structure of each source. However, for the speech signals, the performance of ILRMA becomes worse. We guess this is because speech does not have a low-rank time-frequency structures, and the source model could not capture the precise speech structures even if we set the source model as an oracle one.
  15. Next, we show the results with fully blind situation. Initial W is set to identity matrix, and the initial source model is randomized. Note that FDICA+IPS still uses the oracle spectrogram for solving the permutation.
  16. This is the result. We could not obtain the same results as the previous case with ideal initialization. The performance of all the methods is degraded when the window length becomes long. Therefore, at least we can say that, ILRMA has a good potential to separate the sources even in a long window case, but in practice, the blind estimation of precise source model is a difficult problem.
  17. This figure shows the difference of source models in IVA and ILRMA. Since IVA assumes frequency-uniform scale, it is almost an NMF with only one flat basis. On the other hand, ILRMA has more flexible source model with arbitrary number of spectral bases. So we can capture more precise TF structure of each source.
  18. 提案手法ILRMAの対数尤度関数はこのように得られます.ここで(クリック)青丸で囲った空間分離フィルタWと,赤丸で囲ったNMF音源モデルTVが求めるべき変数になります.(クリック) さらにこの式は,(クリック)前半が従来のIVAのコスト関数と等価であり,(クリック)後半が従来のNMFのコスト関数と等価です.(クリック) したがって,IVAとNMFの反復更新式を交互に反復することで全変数を容易に推定できます. さらに,音源毎に適切なランク数を潜在変数で適応的に決定することも可能です. これは,冒頭で示した通り,音楽信号といえどもボーカルはあまり低ランクにならず,ドラム信号は低ランク,といったことが起こりえますので,音源毎の適切なランクが変わります. そのような状況に対して尤度最大化の基準で自動的に基底を割り振るのがこの潜在変数の役割です.
  19. ILRMAの反復更新式はこのように導出できます. 空間分離フィルタの更新と音源モデルの更新を交互に行うことで,全変数が最適化されます. これらの反復計算で尤度が単調増加することが保証されているので,初期値近傍の局所解への収束が保証されています.
  20. つまり,提案手法はまず空間分離フィルタを学習し,それで分離された信号の音色構造をNMFで学習,その結果得られる音源モデルを空間分離フィルタの学習に再利用し,さらに高精度な分離信号が得られる,という反復になります. このプロセスを何度も更新することで,音源毎の明確な音色構造が捉えられ,空間分離フィルタの性能向上が期待できます.
  21. また,論文ではNMFの多チャネル信号への拡張手法である多チャネルNMFとILRMAが密接に関連しているという事実を明らかにしています. 簡単に説明いたしますと,従来の多チャネルNMFで定義されている空間情報に関するモデル「空間相関行列」のランクが1となる制約を課した場合とILRMAが等価となる,という事実です. ただし,多チャネルNMFは混合系を推定する手法であり,ILRMAやIVAのように分離系を推定する技術とは異なります.そのため,多チャネルNMFは計算効率や不安定性の観点から実用性にやや欠ける点があります.これに関しては比較実験で示します.
  22. さて,IVAの音源モデルと提案手法の音源モデルの違いについて説明します. IVAは,球対称な多次元分布に基づくため,周波数方向に一様な分散を持つ音源モデルになります.これは1本の基底のNMFに非常に近いと言えます. 一方提案手法は,任意の数の基底を用いたNMFが音源モデルになります.従って,この図のように音楽信号の具体的な調波構造を捉えることができます. このように明確な音源モデルを推定することで,独立性基準での分離性能の向上が期待できます.
  23. we only focus on these three algorithms, FDICA, IVA, and ILRMA.